anomalyDetectorOutliers

Applies Grubbs’ test to detect anomalies (statistical outliers) in a series of numbers assumed to come from a normally distributed population.

The result is a list of ATL objects, one for each detected anomaly. An example object is:

( label = March, value = -8,500, anomalyCategory = outlier, anomalyClass = low, nearestNormalLabel = January, nearestNormalValue = 5,900, normalMean = 7,088.89, normalStdDev = 826.81 )

See Result object fields for more about the results object.

The function returns an empty list when there are no anomalies.

Parameters

SERIES (list)
The input series of numbers.
This must be a list or list-like object (e.g. a column or row) containing at least seven numbers.
LABELS (list)
A list of labels, one for each number in the input series.
This must be a list or list-like object containing strings.

Notes

The function removes any null values from SERIES before performing its calculation.
If LABELS contains a null value, the function uses 'No Label' for that label in the result.
If SERIES is longer than LABELS, the function automatically truncates SERIES to match the length of LABELS.

Result object fields

Each result object contains these fields:

Field	Description
label	The label for the anomalous value.
value	The anomalous value.
anomalyCategory	This is always outlier for anomalies detected by this function.
anomalyClass	Indicates whether the anomaly is high or low.
nearestNormalLabel	The label for the value that is nearest in value to the anomaly but is not itself anomalous.
nearestNormalValue	The value that is nearest in value to the anomaly but is not itself anomalous.
normalMean	The mean of all values in the comparison group excluding anomalies.
normalStdDev	The standard deviation for all values in the comparison group excluding anomalies.

Examples

Assume a "Describe the Table" project with this data:

	Code	Month	Orders	salesRev	COGS	otherRev	otherExp	netProfit
Row 1	1001	January	12	14,000.00	7,500.00	1,000.00	1,500.00	5,900.00
Row 2	1002	February	15	15,500.00	7,000.00	1,000.00	1,500.00	8,000.00
Row 3	1003	March	11	12,750.00	9,500.00	1,200.00	13,000.00	-8,500.00
Row 4	1004	April	17	16,250.00	8,000.00	1,000.00	1,500.00	7,750.00
Row 5	1005	May	18	16,000.00	8,250.00	1,200.00	1,500.00	7,450.00
Row 6	1006	June	43	40,000.00	20,000.00	1,250.00	1,500.00	19,750.00
Row 7	1007	July	35	32,500.00	17,750.00	1,250.00	1,500.00	14,500.00
Row 8	1008	August	16	15,750.00	7,500.00	1,250.00	1,500.00	8,200.00
Row 9	1009	September	14	14,250.00	7,000.00	1,000.00	1,500.00	6,750.00
Row 10	1010	October	16	15,500.00	8,000.00	1,000.00	1,500.00	7,000.00
Row 11	1011	November	15	14,500.00	8,000.00	1,000.00	1,500.00	6,000.00
Row 12	1012	December	15	15,250.00	8,000.00	1,000.00	1,500.00	6,750.00

Suppose you want to find the outliers for Orders and identify them by Month.

The ATL [[anomalyDetectorOutliers(Orders, Month)]] returns this result:

(

( label = June, value = 43, anomalyCategory = outlier, anomalyClass = high, nearestNormalLabel = May, nearestNormalValue = 18, normalMean = 14.9, normalStdDev = 2.13 ),

( label = July, value = 35, anomalyCategory = outlier, anomalyClass = high, nearestNormalLabel = May, nearestNormalValue = 18, normalMean = 14.9, normalStdDev = 2.13 )

)

The results list contains two ATL objects. Each object is a series of key–value pairs; and from these you can identify the anomalous values (43 and 35), the months they occurred (June and July), and whether they are high or low outliers.

Note

The function detects statistical outliers; therefore, the anomalyCategory value is always outlier. The anomalyClass value is either high or low depending on whether the outlier is above or below the mean.

Now, suppose you want to find the outliers for netProfit and identify them by Month.

The ATL is [[anomalyDetectorOutliers(netProfit, Month)]] and the result is:

(

( label = March, value = -8,500, anomalyCategory = outlier, anomalyClass = low, nearestNormalLabel = January, nearestNormalValue = 5,900, normalMean = 7,088.89, normalStdDev = 826.81 ),

( label = June, value = 19,750, anomalyCategory = outlier, anomalyClass = high, nearestNormalLabel = August, nearestNormalValue = 8,200, normalMean = 7,088.89, normalStdDev = 826.81 ),

( label = July, value = 14,500, anomalyCategory = outlier, anomalyClass = high, nearestNormalLabel = August, nearestNormalValue = 8,200, normalMean = 7,088.89, normalStdDev = 826.81 )

)

This time the function finds three outliers. Two are high outliers and one is low.

Tip

You can filter by anomalyClass to get the high outliers only or the low outliers only:

[[filter(anomalyDetectorOutliers(netProfit, Month), x -> x.anomalyClass == 'high')]]

[[filter(anomalyDetectorOutliers(netProfit, Month), x -> x.anomalyClass == 'low')]]

Here's how you might extract values from the result to produce narrative text:

[[

results = anomalyDetectorOutliers(netProfit, Month)

highOnly = filter(results, x -> x.anomalyClass == 'high')

lowOnly = filter(results, x -> x.anomalyClass == 'low')

highResults = map(highOnly, x -> joinStrings(x.label, ' (', abbreviateNumber(currencyFormat(x.value)), ')'))

lowResults = map(lowOnly, x -> joinStrings(x.label, ' (', abbreviateNumber(currencyFormat(x.value)), ')'))

"Net Profit was unusually high in [[highResults]] and unusually low in [[lowResults]]."

]]

The output text is:

Net Profit was unusually high in June ($19.8K) and July ($14.5K) and unusually low in March (-$8.5K).