Skip to main content

anomalyDetectorExtremeValues

Detects subsets of contiguous values that are anomalous because all values in the subset are extremely high or extremely low. You might use this function to identify periods in which all monthly sales totals are extremely high.

An extreme value is one that is more than the threshold percentage above standard deviation distance from the mean of the whole series. Values that are not extreme are deemed normal.

The result is a list of ATL objects, one for each detected subset of extreme values. The object gives the start and end dates for the subset, information about the extreme values, and other details that may be useful. See Result object fields for a full breakdown. The function returns an empty list if there are no anomalies — that is, no subsets of extreme values.

The function aggregates and compares data at the selected level of time granularity (daily, monthly, or quarterly). The function uses sum aggregation by default, but you can select a different type by using the function's optional parameter.

Parameters

  • SERIES (list)

    The input series of numbers. This must be a list or list-like object (e.g. a column or row) containing numbers.

  • DATES (list)

    A list of dates, one for each number in the input series. This must be a list or list-like object containing datetime objects, or strings in the 'yyyy-MM-dd' pattern.

  • TIME GRANULARITY (string)

    The granularity at which the data is compared. The options are daily, monthly, or quarterly.

  • THRESHOLD PERCENTAGE (number)

    The percentage above standard deviation distance from the mean that values must be to count as extreme. For example, a value of 50 sets the threshold to 150% of the standard deviation, so all values in a contiguous subset must be that much above or below the mean for the subset to be anomalous.

  • WINDOW SIZE (number)

    The minimum number of values required for a subset to qualify as an anomaly. Must be an integer value >= 2.

  • AGGREGATION FUNCTION (function)

    Optional. The function used to aggregate values within the same time granularity period. The options are sum(), mean(), maxVal(), minVal(), or count(). Sum aggregation applies by default.

    Default: sum()

Notes

  • The function removes null values from SERIES (and corresponding values in DATES) before performing its analysis.

  • The function removes null values from DATES (and corresponding values in SERIES) before performing its analysis.

  • If SERIES and DATES are of unequal length, the function truncates the longer list to ensure lists of equal length.

  • The input to THRESHOLD PERCENTAGE must be a non-negative number.

Result object fields

The function returns an ATL object for each detected anomaly — that is, each subset of extreme values.

Each result object contains these fields:

Field

Description

startDate

The start date for the subset.

endDate

The end date for the subset.

startValue

The first value in the subset.

endValue

The last value in the subset.

anomalyRange

This is always time series for anomalies detected by this function.

anomalyCategory

This is always collective for anomalies detected by this function.

anomalyType

This is always extreme for anomalies detected by this function.

anomalyClass

Indicates whether the anomaly is high, low, or mixed.

maxValue

The highest value in the subset.

maxValueDate

The date for the highest value in the subset.

minValue

The lowest value in the subset.

minValueDate

The date for the lowest value in the subset.

nearestNormalValue

The normal (non-extreme) value in the series that is nearest (in value) to the values in the subset.

nearestNormalDate

The date for the nearest normal value (see above).

normalMean

The mean of all normal values in the comparison group.

normalStdDev

The standard deviation for all normal values in the comparison group.

Important

The function aggregates and compares data at the selected level of time granularity, so the value for fields such as startValue, maxValue, nearestNormalValue, etc. is an aggregated value. See the examples for further guidance.

Examples

Example 1 — monthly granularity

This example uses a dataset that is far too large to display fully here. Instead, we've uploaded the data into THIS DOWNLOADABLE PROJECT. The project contains no script content, so you can copy-paste in the example ATL and try it yourself.

The full dataset contains sales data for 2020 and 2021. For reference, the first six rows are:

ID

Date

Day

Month

Quarter

Year

Sales

Row 1

1

2020-01-02

2

Jan

Q1

2020

5,000.00

Row 2

2

2020-01-11

11

Jan

Q1

2020

4,000.00

Row 3

3

2020-01-23

23

Jan

Q1

2020

3,000.00

Row 4

4

2020-02-04

4

Feb

Q1

2020

6,000.00

Row 5

5

2020-02-08

8

Feb

Q1

2020

2,000.00

Row 6

6

2020-02-17

17

Feb

Q1

2020

5,000.00

Suppose you want to look for periods (three months minimum) in which each monthly sales total is an extreme value. Also, suppose an extreme value is one 30% higher or lower than usual, allowing for normal variation as measured by standard deviation.

The ATL [[anomalyDetectorExtremeValues(Sales, Date, 'monthly', 30, 3)]] returns this result:

(

(startDate = Jun 1, 2020, endDate = Aug 10, 2020, startValue = 26,000, endValue = 28,000, anomalyRange = time series, anomalyCategory = collective, anomalyType = extreme, anomalyClass = high, maxValue = 33,000, maxValueDate = Jul 3, 2020, minValue = 26,000, minValueDate = Jun 1, 2020, nearestNormalDate = Jun 3, 2021, nearestNormalValue = 15,000, normalMean = 12,222.22, normalStdDev = 1,165.97),

(startDate = Feb 3, 2021, endDate = Apr 5, 2021, startValue = 4,000, endValue = 3,000, anomalyRange = time series, anomalyCategory = collective, anomalyType = extreme, anomalyClass = low, maxValue = 4,000, maxValueDate = Feb 3, 2021, minValue = 3,000, minValueDate = Apr 5, 2021, nearestNormalDate = May 10, 2020, nearestNormalValue = 10,000, normalMean = 12,222.22, normalStdDev = 1,165.97)

)

The results list contains two ATL objects, one for each anomalous subset. You can tell from the anomalyClass values that one anomaly is high and the other is low. The result is easier to understand by looking at this table of aggregated totals:

anomalyExtreme1.png

The high anomaly comprises a three-month period from Jun 2020 to Aug 2020. Each aggregated total in this subset is extremely high. Similarly, the low anomaly comprises a period from Feb 2021 to Apr 2021 and each total in that subset is extremely low.

Here's how you might extract values from the result to produce narrative text:

[[

results = anomalyDetectorExtremeValues(Sales, Date, 'monthly', 30, 3)

highOnly = filter(results, x -> x.anomalyClass == 'high')[0:1]

lowOnly = filter(results, x -> x.anomalyClass == 'low')[0:1]

highStartDate = map(highOnly, x -> "[[formatDateTime(x.startDate, 'MMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.startValue))]])")

highEndDate = map(highOnly, x -> "[[formatDateTime(x.endDate, 'MMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.endValue))]])")

lowStartDate = map(lowOnly, x -> "[[formatDateTime(x.startDate, 'MMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.startValue))]])")

lowEndDate = map(lowOnly, x -> "[[formatDateTime(x.endDate, 'MMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.endValue))]])")

normalAvg = unique(map(results, x -> x.normalMean))

"Monthly sales were extremely high from [[highStartDate]] to [[highEndDate]] and extremely low from [[lowStartDate]] to [[lowEndDate]]. Excluding the anomalous periods, average monthly sales were [[abbreviateNumber(currencyFormat(normalAvg))]]."

]]

The output text is:

Monthly sales were extremely high from Jun 2020 ($26K) to Aug 2020 ($28K) and extremely low from Feb 2021 ($4K) to Apr 2021 ($3K). Excluding the anomalous periods, average monthly sales were $12.2K.

 

NOTE ON QUARTERLY GRANULARITY

If you ran a similar analysis but with the TIME GRANULARITY parameter set to quarterly, the function would return an empty list. You can try this in the downloadable project using this ATL: [[anomalyDetectorExtremeValues(Sales,Date,'quarterly',30,3)]].

It's easier to understand the result (no anomalies) by looking at this table of aggregated totals:

anomalyExtreme3.png

Remember that the WINDOW SIZE parameter is set to 3. The total for Q3, 2020 (73,000) stands out as extremely high, but it's not detected as an anomaly because it's a single value. As per the ATL, function looks for sets of three or more extreme values.

Tip

To detect single values that are extremely high or extremely low, use anomalyDetectorContextual instead.

 

Example 2 — Daily granularity

Assume a "Describe the Table" project with this data:

ID

Date

Day

Month

Quarter

Year

Sales

Row 1

1

2021-02-01

1

Feb

Q1

2021

9,000.00

Row 2

2

2021-02-02

2

Feb

Q1

2021

8,000.00

Row 3

3

2021-02-03

3

Feb

Q1

2021

10,000.00

Row 4

4

2021-02-04

4

Feb

Q1

2021

8,000.00

Row 5

5

2021-02-05

5

Feb

Q1

2021

7,000.00

Row 6

6

2021-02-06

6

Feb

Q1

2021

19,000.00

Row 7

7

2021-02-07

7

Feb

Q1

2021

22,000.00

Row 8

8

2021-02-08

8

Feb

Q1

2021

1,000.00

Row 9

9

2021-02-09

9

Feb

Q1

2021

2,000.00

Row 10

10

2021-02-10

10

Feb

Q1

2021

8,000.00

Row 11

11

2021-02-11

11

Feb

Q1

2021

10,000.00

Row 12

12

2021-02-12

12

Feb

Q1

2021

9,000.00

Row 13

13

2021-02-13

13

Feb

Q1

2021

8,000.00

Row 14

14

2021-02-14

14

Feb

Q1

2021

10,000.00

The table contains sales data for the first two weeks of Feb 2021. Typically, you would have sales data covering a longer period, but we've kept the dataset small to simplify the example. Use THIS DOWNLOADABLE PROJECT if you want to try the example yourself.

Suppose you want to look for periods (three days minimum) in which each daily sales total is an extreme value. Also, suppose an extreme value is one 30% higher or lower than usual, allowing for normal variation as measured by standard deviation.

The ATL [[anomalyDetectorExtremeValues(Sales, Date, 'daily', 30, 3)]] returns this result:

(

(startDate = Feb 6, 2021, endDate = Feb 9, 2021, startValue = 19,000, endValue = 2,000, anomalyRange = time series, anomalyCategory = collective, anomalyType = extreme, anomalyClass = mixed, maxValue = 22,000, maxValueDate = Feb 7, 2021, minValue = 1,000, minValueDate = Feb 8, 2021, nearestNormalDate = Feb 5, 2021, nearestNormalValue = 7,000, normalMean = 8,700, normalStdDev = 1,059.35)

)

The results list contains a single ATL object. The anomalyClass value is mixed, which means the subset contains a mix of extremely high values and extremely low values. The result is easier to understand by looking at this table of aggregated totals:

anomalyExtreme2.png

The anomaly comprises a four-day period from Feb 6, 2021 to Feb 9, 2021. As you can see, there are two extremely high values and two extremely low values. Note that the highest and lowest values (22,000 and 1,000) are provided in the result object.

Here's how you might extract values from the result to produce narrative text:

[[

results = anomalyDetectorExtremeValues(Sales, Date, 'daily', 30, 3)

mixedOnly = filter(results, x -> x.anomalyClass == 'mixed')[0:1]

startDate = map(mixedOnly, x -> x.startDate)

endDate = map(mixedOnly, x -> x.endDate)

maxValDetails = map(mixedOnly, x -> "[[x.maxValueDate]] ([[abbreviateNumber(currencyFormat(x.maxValue))]])")

minValDetails = map(mixedOnly, x -> "[[x.minValueDate]] ([[abbreviateNumber(currencyFormat(x.minValue))]])")

normalAvg = abbreviateNumber(currencyFormat(map(mixedOnly, x -> x.normalMean)))

"Daily sales were either extremely high or extremely low from [[startDate]] to [[endDate]]. The highest sales total in this period was on [[maxValDetails]] and the lowest was on [[minValDetails]]. Excluding the anomalous period, average daily sales were [[normalAvg]]."

]]

The output text is:

Daily sales were either extremely high or extremely low from Feb 6, 2021 to Feb 9, 2021. The highest sales total in this period was on Feb 7, 2021 ($22K) and the lowest was on Feb 8, 2021 ($1K). Excluding the anomalous period, average daily sales were $8.7K.

Note

All examples above use sum aggregation, which applies by default. Remember that you can pick a different aggregation type — e.g. mean or count — using the optional parameter.