Skip to main content

Grouping

A common task is to group data by common values before calculating aggregated totals for each group. For example, you might group your data by product name to get a total sales value for each product.

In some cases, you might want to group by two or more dimensions. For example, you might group by product name and city name, then get an aggregated sales value for each product in each city.

Studio has three ATL functions that perform this kind of task, each performing the same operation but on different data structures. groupByTable works on table data, groupByJson works on two-dimensional arrays, and groupByJsonObject works on arrays of objects.

The fourth grouping function — groupByVarianceTable — performs a slightly different analysis. It groups and aggregates table data just like groupByTable does, but it does so as part of a drilldown analysis that identifies drivers and offsets of a parent variance trend.

Index

 

How to group table rows

The ATL function for grouping table rows is groupByTable.

This function is available in "Describe the Table" and "Describe Row in Context" projects only.

Example for groupByTable

Assume a "Describe the Table" project with this data:

ID

City

Region

Quarter

Year

Sales

Row 1

1

Atlanta

South

Q1

2022

56,000

Row 2

2

Los Angeles

West

Q1

2022

52,000

Row 3

3

Dallas

South

Q1

2022

44,000

Row 4

4

New York

East

Q1

2022

57,000

Row 5

5

Pittsburgh

East

Q1

2022

53,000

Row 6

6

Atlanta

South

Q2

2022

71,000

Row 7

7

Los Angeles

West

Q2

2022

73,000

Row 8

8

Dallas

South

Q2

2022

66,000

Row 9

9

New York

East

Q2

2022

48,000

Row 10

10

Pittsburgh

East

Q2

2022

36,000

Suppose you want to find which Region achieved the highest Sales total.

The best solution is to use groupByTable. The ATL is:

[[groupByTable(WholeTable, Region, Sales, sum())]]

This tells Studio to analyze the whole table, group all rows with matching Region values, and aggregate the Sales values for each group. The final parameter input — sum() — specifies sum aggregation.

The output is this list of lists:

( ("South", 237000), ("East", 194000), ("West", 125000) )

There are three inner lists — one for each group — and each contains two values: the group's Region value and its sum-aggregated Sales value. The results are sorted by the aggregated values in descending order.

The printed result is:

South and 237,000, East and 194,000 and West and 125,000

Here's how you might turn the result into coherent narrative text:

[[

allResults = groupByTable(WholeTable, Region, Sales, sum())

topResult = allResults[0:1]

topRegionName = topResult[0][0]

topRegionSales = abbreviateNumber(currencyFormat(topResult[0][1]))

joinStrings('the highest Sales total (', topRegionSales, ') was in the ', topRegionName, ' region.')

]]
  • The first line runs a groupByTable analysis.

  • The second line uses list slicing to get the top result — i.e. the first inner list.

  • The third and fourth lines get the top region's name and sales total.

  • The last line uses joinStrings to join computed values with fixed text.

The printed result is:

The highest Sales total ($237K) was in the South region.

This example has one grouping column (Region), one aggregation column (Sales), and one aggregation type (sum). The groupByTable function can perform more complex analyses — see the function topic for examples.

How to group JSON objects

The ATL function for grouping JSON objects is groupByJsonObject.

This function is available in "Describe a JSON Object" projects only.

Note

If your JSON data is a two-dimensional array (an array of arrays), use groupByJson.

Example for groupByJsonObject

Assume a "Describe a JSON Object" project with this data:

{
  "data": [
    {"ID": 1, "City": "Atlanta", "Region": "South", "Quarter": "Q1", "Year": 2022, "Sales": 56000},
    {"ID": 2, "City": "Los Angeles", "Region": "West", "Quarter": "Q1", "Year": 2022, "Sales": 52000},
    {"ID": 3, "City": "Dallas", "Region": "South", "Quarter": "Q1", "Year": 2022, "Sales": 44000},
    {"ID": 4, "City": "New York", "Region": "East", "Quarter": "Q1", "Year": 2022, "Sales": 57000},
    {"ID": 5, "City": "Pittsburgh", "Region": "East", "Quarter": "Q1", "Year": 2022, "Sales": 53000},
    {"ID": 6, "City": "Atlanta", "Region": "South", "Quarter": "Q2", "Year": 2022, "Sales": 71000},
    {"ID": 7, "City": "Los Angeles", "Region": "West", "Quarter": "Q2", "Year": 2022, "Sales": 73000},
    {"ID": 8, "City": "Dallas", "Region": "South", "Quarter": "Q2", "Year": 2022, "Sales": 66000},
    {"ID": 9, "City": "New York", "Region": "East", "Quarter": "Q2", "Year": 2022, "Sales": 48000},
    {"ID": 10, "City": "Pittsburgh", "Region": "East", "Quarter": "Q2", "Year": 2022, "Sales": 36000}
  ]
}

The "data" array contains 10 JSON objects, each containing six key–value pairs.

Suppose you want to find which City achieved the highest Sales total.

The best solution is to use groupByJsonObject. The ATL is:

[[groupByJsonObject(WholeJSON.data, 'City', 'Sales', sum())]]

This tells Studio to analyze the whole "data" array, group all objects with matching City values, and aggregate the Sales values for each group. The final parameter input — sum() — specifies sum aggregation.

The output is this list of lists:

( ("Atlanta", 127000), ("Los Angeles", 125000), ("Dallas", 110000), ("New York", 105000), ("Pittsburgh", 89000) )

There are five inner lists — one for each group — and each contains two values: the groups City value and its sum-aggregated Sales value. The results are sorted by the aggregated values in descending order.

The printed result is:

Atlanta and 127,000, Los Angeles and 125,000, Dallas and 110,000, New York and 105,000 and Pittsburgh and 89,000

Here's how you might turn the result into coherent narrative text:

[[

allResults = groupByJsonObject(WholeJSON.data, 'City', 'Sales', sum())

topResult = allResults[0:1]

topCityName = topResult[0][0]

topCitySales = abbreviateNumber(currencyFormat(topResult[0][1]))

joinStrings(topCityName, ' achieved the best sales, with a total of ', topCitySales, '.')

]]
  • The first line runs a groupByJsonObject analysis.

  • The second line uses list slicing to get the top result — i.e. the first inner list.

  • The third and fourth lines get the top city's name and sales total.

  • The last line uses joinStrings to join computed values with fixed text.

The printed result is:

Atlanta achieved the best sales, with a total of $127K.

This example has one grouping field (City), one aggregation field (Sales), and one aggregation type (sum). The groupByJsonObject function can perform more complex analyses — see the function topic for examples.

How to group JSON arrays

The function for grouping JSON arrays is groupByJson.

This function is available in "Describe a JSON Object" projects only.

Note

If your JSON data is an array of objects, use groupByJsonObject.

Example for groupByJson

Assume a "Describe a JSON Object" project with this data:

{
    "data": [
	[1, "Atlanta", "South", "Q1", 2022, 56000],
	[2, "Los Angeles", "West", "Q1", 2022, 52000],
	[3, "Dallas", "South", "Q1", 2022, 44000],
	[4, "New York", "East", "Q1", 2022, 57000],
	[5, "Pittsburgh", "East", "Q1", 2022, 53000],
	[6, "Atlanta", "South", "Q2", 2022, 71000],
	[7, "Los Angeles", "West", "Q2", 2022, 73000],
	[8, "Dallas", "South", "Q2", 2022, 66000],
	[9, "New York", "East", "Q2", 2022, 48000],
	[10, "Pittsburgh", "East", "Q2", 2022, 36000]
    ]
}

The "data" array contains 10 inner arrays, each containing six values. The values are indexed 0 to 5. The ID value is at index 0, the City value is at index 1, and so on. You must use the index numbers for reference.

Suppose you want to report on the top three cities for average sales.

The best solution is to use groupByJson. The ATL is:

[[groupByJson(WholeJSON.data, 1, 5, mean())]]

This tells Studio to analyze the whole "data" array, group the arrays with matching City values (Index 1), and aggregate the Sales values (Index 5) for each group. The final parameter input — mean() — specifies mean aggregation.

The result is this list of lists:

( ("Atlanta", 63500), ("Los Angeles", 62500), ("Dallas", 55000), ("New York", 52500), ("Pittsburgh", 44500) )

There are five inner lists — one for each group — and each contains two values: the groups City value and its mean-aggregated Sales value. The results are sorted by the aggregated values in descending order.

The printed result is:

Atlanta and 63,500, Los Angeles and 62,500, Dallas and 55,000, New York and 52,500 and Pittsburgh and 44,500

Here's how you might turn the result into coherent narrative text:

[[

allResults = groupByJson(WholeJSON.salesData, 1, 5, mean())

topResults = allResults[0:3]

topThreeCities = map(topResults, x -> x[0])

topThreeSales = map(topResults, x -> abbreviateNumber(currencyFormat(x[1])))

"The top three cities for average sales were [[topThreeCities]], with [[topThreeSales]] respectively."

]]
  • The first line runs a groupByJson analysis.

  • The second line uses list slicing to get the top three results (first three inner lists).

  • The third line maps the top three results to a list of city names.

  • The fourth line maps the top three results to a list of aggregated sales values.

  • The last line combines fixed text with computed values.

The printed result is:

The top three cities for average sales were Atlanta, Los Angeles and Dallas, with $63.5K, $62.5K and $55K respectively.

The groupByJson function can perform more complex analyses — see the function topic for examples.

How to group for variance analysis

The groupByVarianceTable function calculates the variance between sum-aggregated values for two measures (base and comparison), then breaks the variance down by one or more grouping dimensions.

You might use the function to identify a parent variance trend — e.g. Sales exceeding Target — then drilldown by Country to identify whether sales in specific countries drove or offset that trend.

See the function topic for guidance and examples.