Skip to main content

groupByJson

Groups JSON arrays with matching values in one or more fields. For each group, the function aggregates the data for one more fields. Sum aggregation applies by default, but you can pick a different type.

The function returns a list of lists, with each inner list displaying the grouping values and aggregated values for a unique group. The lists are sorted by the first aggregated value in descending numerical order.

Note

Use this function if your JSON data is structured as a two-dimensional array — i.e. an array of arrays. Use groupByJsonObject if your data is an array of objects.

Parameters

  • INPUT DATA (JSON array)

    An array containing other arrays. Each inner array must contain the same number of fields.

  • GROUPING FIELDS (number or list)

    One or more fields to group by. Fields must be referenced by their index number. The first field in an array is Index 0, the second field is Index 1, and so on. For multiple fields, give a list of index numbers.

  • AGGREGATION FIELDS (number or list)

    One or more fields with numeric values. Fields must be referenced by their index number. The first field in the array is Index 0, the second field Index is 1, and so on. For multiple fields, give a list of index numbers.

  • AGGREGATION FUNCTIONS (function or list)

    Optional. The functions for aggregating the values in the aggregation fields. The number of aggregation functions should match the number of aggregation fields. The functions and fields should be in matching order. working left to right through the ATL.

Aggregation functions

Function

Purpose

Note

count()

Counts the number of inner arrays.

Counts null or empty values.

maxVal()

Calculates the maximum value.

Ignores null or empty values.

mean()

Calculates the mean value.

Ignores null or empty values.

minVal()

Calculates the minimum value.

Ignores null or empty values.

sum()

Calculates the sum total.

Ignores null or empty values.

Examples

Assume a "Describe a JSON Object" project with this data:

{
    "finances": [
	[1111, "California", "Groceries", 50235.45, 3014.07],
	[1112, "California", "Groceries", 47221.90, 3069.48],
	[2121, "Texas", "Clothing", 44476.12, 2668.57],
	[2111, "Texas", "Groceries", 55713.66, 3285.32],
	[1113, "California", "Groceries", 67315.85, 4712.11],
	[3121, "New York", "Clothing", 66391.40, 4647.27],
	[1121, "California", "Clothing", 24059.78, 1684.18],
	[2122, "Texas", "Clothing", 40727.24, 2565.63],
	[3111, "New York", "Groceries", 42457.60, 2547.45],
	[2112, "Texas", "Groceries", 61616.90, 3697.01],
	[1122, "California", "Clothing", 19393.42, 1376.93],
	[3122, "New York", "Clothing", 70781.55, 4961.67]
    ]
}

The main array is "finances". There are 12 inner arrays, each containing five fields. The five fields are indexed 0 to 4. Field 0 is the branch ID (e.g. 1111), Field 1 is the region (e.g. California), Field 2 is the store type, Field 3 is the sales total, and Field 4 is the profit total.

You can use these field index numbers to specify the grouping and aggregation fields.

Example 1 — Single grouping field with single aggregation

Suppose you want to know how each region performed for total profit. You can group the inner arrays by Region (Index 1) and then sum-aggregate the Profit (Index 4) values for each group.

The ATL is: [[groupByJson(WholeJSON.finances, 1, 4, sum())]].

The output is this list of lists:

( ("California", 13856.77), ("Texas", 12216.53), ("New York", 12156.39) )

Each inner list contains values for a distinct group. For each inner list, the first item is the region name, and the second item is the sum-total profit. The list is sorted by the first aggregated value in descending order.

The printed output is:

California and 13,856.77, Texas and 12,216.53 and New York and 12,156.39

Here's how you might extract values from the result to produce narrative text:

[[global.regions = groupByJson(WholeJSON.finances, 1, 4, sum()); global.topOne = top(regions,1);'']]

The best performing region was [[topOne[0][0]]], with profits of [[currencyFormat(topOne[0][1])]].

[[sorted = sort(regions, (x,y) -> sign(x[1] - y[1])); global.bottomOne = top(sorted,1);'']]

The worst performing region was [[bottomOne[0][0]]], with profits of [[currencyFormat(bottomOne[0][1])]].

The printed result is:

The best performing region was California, with profits of $13,856.77.

The worst performing region was New York, with profits of $12,156.39.

Example 2 — Single grouping field with multiple aggregations

Sometimes you need to specify multiple aggregation fields and multiple aggregation types. Suppose you want to produce a narrative that identifies (1) the regions that achieved average sales over $50K, and (2) the number of branches in each successful region.

To calculate average sales, you must group by Region (Index 1) and mean-aggregate the Sales totals (Index 3) for each group. To calculate the number of branches in each region, you must group by Region and then apply count() aggregation to the Branch ID field (Index 0).

The ATL is [[groupByJson(WholeJSON.finances, 1, (3,0), (mean(),count()))]].

The output is this list or lists:

( ("New York", 59876.85, 3), ("Texas", 50633.48, 4), ("California", 41645.08, 5) )

The inner list contains values for a distinct group. For each inner list, the first item is the region name, the second item is the average sales figure, and the third item is the number of unique branches in the group. The printed output for the ATL is:

New York, 59,876.85 and 3, Texas, 50,633.48 and 4 and California, 41,645.08 and 5

Here's how you might extract values from the result to produce narrative text:

[[global.over = filter(groupByJson(WholeJSON.finances, 1, (3,0), (mean(),count())), x -> x[1] >= 50000);'']]

[[global.regionsOver = map(over, x -> x[0]);'']]

[[regionsOver]] were the only regions with average sales over $50K.

[[over[0][0]]] averaged [[currencyFormat(over[0][1])]] from [[numToWords(over[0][2],'',10)]] branches.

[[over[1][0]]] averaged [[currencyFormat(over[1][1])]] from [[numToWords(over[1][2],'',10)]] branches.

The printed result is:

New York and Texas were the only regions with average sales over $50K.

New York averaged $59,876.85 from three branches.

Texas averaged $50,633.48 from four branches.

Example 3 — Multiple grouping fields with multiple aggregations

The previous examples use a single grouping field. This example demonstrates the benefits of using two or more fields (second parameter) to make your grouping criteria more precise.

Suppose you want to group by Region and Type. Also, suppose that you wish to discover (1) the Region + Type group with the highest sales, and (2) the Region + Type group with the highest average profit.

The ATL is [[groupByJson(WholeJSON.finances, (1,2), (3,4), (sum(),mean()))]].

The output is this list of lists:

( ("California", "Groceries", 164772.2, 3598.55), ("New York", "Clothing", 137172.95, 4804.47), ("Texas", "Groceries", 117330.56, 3491.17), ("Texas", "Clothing", 85203.36, 2617.1), ("California", "Clothing", 43453.2, 1530.56), ("New York", "Groceries", 42457.6, 2547.45) )

Each inner list contains values for a distinct group. For each inner list, the first item is the region name, the second item is the store type, the third item is the sum-total for sales, and the fourth item is the mean profit.

The printed output is:

California, Groceries, 164,772.2 and 3,598.55, New York, Clothing, 137,172.95 and 4,804.47, Texas, Groceries, 117,330.56 and 3,491.17, Texas, Clothing, 85,203.36 and 2,617.1, California, Clothing, 43,453.2 and 1,530.56 and New York, Groceries, 42,457.6 and 2,547.45

Here's how you might extract values from the result to produce narrative text:

[[global.results = groupByJson(WholeJSON.finances, (1,2), (3,4), (sum(),mean())); global.topSales = top(results,1);'']]

[[topSales[0][1]]] stores in [[topSales[0][0]]] achieved the best total sales ([[abbreviateNumber(currencyFormat(topSales[0][2]))]]).

[[sortedByProfit = sort(results, (x,y) -> sign(y[3] - x[3])); global.topProfit = top(sortedByProfit,1);'']]

[[topProfit[0][1]]] stores in [[topProfit[0][0]]] achieved the best average profit ([[abbreviateNumber(currencyFormat(topProfit[0][3]))]]).

The printed result is:

Groceries stores in California achieved the best total sales ($164.8K).

Clothing stores in New York achieved the best average profit ($4.8K).