anomalyDetectorOutliers
Applies Grubbs’ test to detect anomalies (statistical outliers) in a series of numbers assumed to come from a normally distributed population.
The result is a list of ATL objects, one for each detected anomaly. An example object is:
|
See Result object fields for more about the results object.
The function returns an empty list when there are no anomalies.
Parameters
SERIES (list)
The input series of numbers.
This must be a list or list-like object (e.g. a column or row) containing at least seven numbers.
LABELS (list)
A list of labels, one for each number in the input series.
This must be a list or list-like object containing strings.
Notes
The function removes any null values from SERIES before performing its calculation.
If LABELS contains a null value, the function uses 'No Label' for that label in the result.
If SERIES is longer than LABELS, the function automatically truncates SERIES to match the length of LABELS.
Result object fields
Each result object contains these fields:
Field | Description |
---|---|
label | The label for the anomalous value. |
value | The anomalous value. |
anomalyCategory | This is always outlier for anomalies detected by this function. |
anomalyClass | Indicates whether the anomaly is high or low. |
nearestNormalLabel | The label for the value that is nearest in value to the anomaly but is not itself anomalous. |
nearestNormalValue | The value that is nearest in value to the anomaly but is not itself anomalous. |
normalMean | The mean of all values in the comparison group excluding anomalies. |
normalStdDev | The standard deviation for all values in the comparison group excluding anomalies. |
Examples
Assume a "Describe the Table" project with this data:
Code | Month | Orders | salesRev | COGS | otherRev | otherExp | netProfit | |
---|---|---|---|---|---|---|---|---|
Row 1 | 1001 | January | 12 | 14,000.00 | 7,500.00 | 1,000.00 | 1,500.00 | 5,900.00 |
Row 2 | 1002 | February | 15 | 15,500.00 | 7,000.00 | 1,000.00 | 1,500.00 | 8,000.00 |
Row 3 | 1003 | March | 11 | 12,750.00 | 9,500.00 | 1,200.00 | 13,000.00 | -8,500.00 |
Row 4 | 1004 | April | 17 | 16,250.00 | 8,000.00 | 1,000.00 | 1,500.00 | 7,750.00 |
Row 5 | 1005 | May | 18 | 16,000.00 | 8,250.00 | 1,200.00 | 1,500.00 | 7,450.00 |
Row 6 | 1006 | June | 43 | 40,000.00 | 20,000.00 | 1,250.00 | 1,500.00 | 19,750.00 |
Row 7 | 1007 | July | 35 | 32,500.00 | 17,750.00 | 1,250.00 | 1,500.00 | 14,500.00 |
Row 8 | 1008 | August | 16 | 15,750.00 | 7,500.00 | 1,250.00 | 1,500.00 | 8,200.00 |
Row 9 | 1009 | September | 14 | 14,250.00 | 7,000.00 | 1,000.00 | 1,500.00 | 6,750.00 |
Row 10 | 1010 | October | 16 | 15,500.00 | 8,000.00 | 1,000.00 | 1,500.00 | 7,000.00 |
Row 11 | 1011 | November | 15 | 14,500.00 | 8,000.00 | 1,000.00 | 1,500.00 | 6,000.00 |
Row 12 | 1012 | December | 15 | 15,250.00 | 8,000.00 | 1,000.00 | 1,500.00 | 6,750.00 |
Suppose you want to find the outliers for Orders and identify them by Month.
The ATL [[anomalyDetectorOutliers(Orders, Month)]]
returns this result:
|
The results list contains two ATL objects. Each object is a series of key–value pairs; and from these you can identify the anomalous values (43 and 35), the months they occurred (June and July), and whether they are high or low outliers.
Note
The function detects statistical outliers; therefore, the anomalyCategory value is always outlier. The anomalyClass value is either high or low depending on whether the outlier is above or below the mean.
Now, suppose you want to find the outliers for netProfit and identify them by Month.
The ATL is [[anomalyDetectorOutliers(netProfit, Month)]]
and the result is:
|
This time the function finds three outliers. Two are high outliers and one is low.
Tip
You can filter by anomalyClass to get the high outliers only or the low outliers only:
[[filter(anomalyDetectorOutliers(netProfit, Month), x -> x.anomalyClass == 'high')]]
[[filter(anomalyDetectorOutliers(netProfit, Month), x -> x.anomalyClass == 'low')]]
Here's how you might extract values from the result to produce narrative text:
[[ results = anomalyDetectorOutliers(netProfit, Month) highOnly = filter(results, x -> x.anomalyClass == 'high') lowOnly = filter(results, x -> x.anomalyClass == 'low') highResults = map(highOnly, x -> joinStrings(x.label, ' (', abbreviateNumber(currencyFormat(x.value)), ')')) lowResults = map(lowOnly, x -> joinStrings(x.label, ' (', abbreviateNumber(currencyFormat(x.value)), ')')) "Net Profit was unusually high in [[highResults]] and unusually low in [[lowResults]]." ]]
The output text is:
Net Profit was unusually high in June ($19.8K) and July ($14.5K) and unusually low in March (-$8.5K).