Skip to main content

anomalyDetectorOutliers

Applies Grubbs’ test to detect anomalies (statistical outliers) in a series of numbers assumed to come from a normally distributed population.

The result is a list of ATL objects, one for each detected anomaly. An example object is:

( label = March, value = -8,500, anomalyCategory = outlier, anomalyClass = low, nearestNormalLabel = January, nearestNormalValue = 5,900, normalMean = 7,088.89, normalStdDev = 826.81 )

See Result object fields for more about the results object.

The function returns an empty list when there are no anomalies.

Parameters

  • SERIES (list)

    The input series of numbers.

    This must be a list or list-like object (e.g. a column or row) containing at least seven numbers.

  • LABELS (list)

    A list of labels, one for each number in the input series.

    This must be a list or list-like object containing strings.

Notes

  • The function removes any null values from SERIES before performing its calculation.

  • If LABELS contains a null value, the function uses 'No Label' for that label in the result.

  • If SERIES is longer than LABELS, the function automatically truncates SERIES to match the length of LABELS.

Result object fields

Each result object contains these fields:

Field

Description

label

The label for the anomalous value.

value

The anomalous value.

anomalyCategory

This is always outlier for anomalies detected by this function.

anomalyClass

Indicates whether the anomaly is high or low.

nearestNormalLabel

The label for the value that is nearest in value to the anomaly but is not itself anomalous.

nearestNormalValue

The value that is nearest in value to the anomaly but is not itself anomalous.

normalMean

The mean of all values in the comparison group excluding anomalies.

normalStdDev

The standard deviation for all values in the comparison group excluding anomalies.

Examples

Assume a "Describe the Table" project with this data:

Code

Month

Orders

salesRev

COGS

otherRev

otherExp

netProfit

Row 1

1001

January

12

14,000.00

7,500.00

1,000.00

1,500.00

5,900.00

Row 2

1002

February

15

15,500.00

7,000.00

1,000.00

1,500.00

8,000.00

Row 3

1003

March

11

12,750.00

9,500.00

1,200.00

13,000.00

-8,500.00

Row 4

1004

April

17

16,250.00

8,000.00

1,000.00

1,500.00

7,750.00

Row 5

1005

May

18

16,000.00

8,250.00

1,200.00

1,500.00

7,450.00

Row 6

1006

June

43

40,000.00

20,000.00

1,250.00

1,500.00

19,750.00

Row 7

1007

July

35

32,500.00

17,750.00

1,250.00

1,500.00

14,500.00

Row 8

1008

August

16

15,750.00

7,500.00

1,250.00

1,500.00

8,200.00

Row 9

1009

September

14

14,250.00

7,000.00

1,000.00

1,500.00

6,750.00

Row 10

1010

October

16

15,500.00

8,000.00

1,000.00

1,500.00

7,000.00

Row 11

1011

November

15

14,500.00

8,000.00

1,000.00

1,500.00

6,000.00

Row 12

1012

December

15

15,250.00

8,000.00

1,000.00

1,500.00

6,750.00

Suppose you want to find the outliers for Orders and identify them by Month.

The ATL [[anomalyDetectorOutliers(Orders, Month)]] returns this result:

(

( label = June, value = 43, anomalyCategory = outlier, anomalyClass = high, nearestNormalLabel = May, nearestNormalValue = 18, normalMean = 14.9, normalStdDev = 2.13 ),

( label = July, value = 35, anomalyCategory = outlier, anomalyClass = high, nearestNormalLabel = May, nearestNormalValue = 18, normalMean = 14.9, normalStdDev = 2.13 )

)

The results list contains two ATL objects. Each object is a series of key–value pairs; and from these you can identify the anomalous values (43 and 35), the months they occurred (June and July), and whether they are high or low outliers.

Note

The function detects statistical outliers; therefore, the anomalyCategory value is always outlier. The anomalyClass value is either high or low depending on whether the outlier is above or below the mean.

Now, suppose you want to find the outliers for netProfit and identify them by Month.

The ATL is [[anomalyDetectorOutliers(netProfit, Month)]] and the result is:

(

( label = March, value = -8,500, anomalyCategory = outlier, anomalyClass = low, nearestNormalLabel = January, nearestNormalValue = 5,900, normalMean = 7,088.89, normalStdDev = 826.81 ),

( label = June, value = 19,750, anomalyCategory = outlier, anomalyClass = high, nearestNormalLabel = August, nearestNormalValue = 8,200, normalMean = 7,088.89, normalStdDev = 826.81 ),

( label = July, value = 14,500, anomalyCategory = outlier, anomalyClass = high, nearestNormalLabel = August, nearestNormalValue = 8,200, normalMean = 7,088.89, normalStdDev = 826.81 )

)

This time the function finds three outliers. Two are high outliers and one is low.

Tip

You can filter by anomalyClass to get the high outliers only or the low outliers only:

[[filter(anomalyDetectorOutliers(netProfit, Month), x -> x.anomalyClass == 'high')]]

[[filter(anomalyDetectorOutliers(netProfit, Month), x -> x.anomalyClass == 'low')]]

Here's how you might extract values from the result to produce narrative text:

[[

results = anomalyDetectorOutliers(netProfit, Month)

highOnly = filter(results, x -> x.anomalyClass == 'high')

lowOnly = filter(results, x -> x.anomalyClass == 'low')

highResults = map(highOnly, x -> joinStrings(x.label, ' (', abbreviateNumber(currencyFormat(x.value)), ')'))

lowResults = map(lowOnly, x -> joinStrings(x.label, ' (', abbreviateNumber(currencyFormat(x.value)), ')'))

"Net Profit was unusually high in [[highResults]] and unusually low in [[lowResults]]."

]]

The output text is:

Net Profit was unusually high in June ($19.8K) and July ($14.5K) and unusually low in March (-$8.5K).