anomalyDetectorExtremeValues

Detects subsets of contiguous values that are anomalous because all values in the subset are extremely high or extremely low. You might use this function to identify periods in which all monthly sales totals are extremely high.

An extreme value is one that is more than the threshold percentage above standard deviation distance from the mean of the whole series. Values that are not extreme are deemed normal.

The result is a list of ATL objects, one for each detected subset of extreme values. The object gives the start and end dates for the subset, information about the extreme values, and other details that may be useful. See Result object fields for a full breakdown. The function returns an empty list if there are no anomalies — that is, no subsets of extreme values.

The function aggregates and compares data at the selected level of time granularity (daily, monthly, or quarterly). The function uses sum aggregation by default, but you can select a different type by using the function's optional parameter.

Parameters

SERIES (list)
The input series of numbers. This must be a list or list-like object (e.g. a column or row) containing numbers.
DATES (list)
A list of dates, one for each number in the input series. This must be a list or list-like object containing datetime objects, or strings in the 'yyyy-MM-dd' pattern.
TIME GRANULARITY (string)
The granularity at which the data is compared. The options are daily, monthly, or quarterly.
THRESHOLD PERCENTAGE (number)
The percentage above standard deviation distance from the mean that values must be to count as extreme. For example, a value of 50 sets the threshold to 150% of the standard deviation, so all values in a contiguous subset must be that much above or below the mean for the subset to be anomalous.
WINDOW SIZE (number)
The minimum number of values required for a subset to qualify as an anomaly. Must be an integer value >= 2.
AGGREGATION FUNCTION (function)
Optional. The function used to aggregate values within the same time granularity period. The options are sum(), mean(), maxVal(), minVal(), or count(). Sum aggregation applies by default.
Default: sum()

Notes

The function removes null values from SERIES (and corresponding values in DATES) before performing its analysis.
The function removes null values from DATES (and corresponding values in SERIES) before performing its analysis.
If SERIES and DATES are of unequal length, the function truncates the longer list to ensure lists of equal length.
The input to THRESHOLD PERCENTAGE must be a non-negative number.

Result object fields

The function returns an ATL object for each detected anomaly — that is, each subset of extreme values.

Each result object contains these fields:

Field	Description
startDate	The start date for the subset.
endDate	The end date for the subset.
startValue	The first value in the subset.
endValue	The last value in the subset.
anomalyRange	This is always time series for anomalies detected by this function.
anomalyCategory	This is always collective for anomalies detected by this function.
anomalyType	This is always extreme for anomalies detected by this function.
anomalyClass	Indicates whether the anomaly is high, low, or mixed.
maxValue	The highest value in the subset.
maxValueDate	The date for the highest value in the subset.
minValue	The lowest value in the subset.
minValueDate	The date for the lowest value in the subset.
nearestNormalValue	The normal (non-extreme) value in the series that is nearest (in value) to the values in the subset.
nearestNormalDate	The date for the nearest normal value (see above).
normalMean	The mean of all normal values in the comparison group.
normalStdDev	The standard deviation for all normal values in the comparison group.

Important

The function aggregates and compares data at the selected level of time granularity, so the value for fields such as startValue, maxValue, nearestNormalValue, etc. is an aggregated value. See the examples for further guidance.

Examples

Example 1 — monthly granularity

This example uses a dataset that is far too large to display fully here. Instead, we've uploaded the data into THIS DOWNLOADABLE PROJECT. The project contains no script content, so you can copy-paste in the example ATL and try it yourself.

The full dataset contains sales data for 2020 and 2021. For reference, the first six rows are:

	ID	Date	Day	Month	Quarter	Year	Sales
Row 1	1	2020-01-02	2	Jan	Q1	2020	5,000.00
Row 2	2	2020-01-11	11	Jan	Q1	2020	4,000.00
Row 3	3	2020-01-23	23	Jan	Q1	2020	3,000.00
Row 4	4	2020-02-04	4	Feb	Q1	2020	6,000.00
Row 5	5	2020-02-08	8	Feb	Q1	2020	2,000.00
Row 6	6	2020-02-17	17	Feb	Q1	2020	5,000.00

Suppose you want to look for periods (three months minimum) in which each monthly sales total is an extreme value. Also, suppose an extreme value is one 30% higher or lower than usual, allowing for normal variation as measured by standard deviation.

The ATL [[anomalyDetectorExtremeValues(Sales, Date, 'monthly', 30, 3)]] returns this result:

(

(startDate = Jun 1, 2020, endDate = Aug 10, 2020, startValue = 26,000, endValue = 28,000, anomalyRange = time series, anomalyCategory = collective, anomalyType = extreme, anomalyClass = high, maxValue = 33,000, maxValueDate = Jul 3, 2020, minValue = 26,000, minValueDate = Jun 1, 2020, nearestNormalDate = Jun 3, 2021, nearestNormalValue = 15,000, normalMean = 12,222.22, normalStdDev = 1,165.97),

(startDate = Feb 3, 2021, endDate = Apr 5, 2021, startValue = 4,000, endValue = 3,000, anomalyRange = time series, anomalyCategory = collective, anomalyType = extreme, anomalyClass = low, maxValue = 4,000, maxValueDate = Feb 3, 2021, minValue = 3,000, minValueDate = Apr 5, 2021, nearestNormalDate = May 10, 2020, nearestNormalValue = 10,000, normalMean = 12,222.22, normalStdDev = 1,165.97)

)

The results list contains two ATL objects, one for each anomalous subset. You can tell from the anomalyClass values that one anomaly is high and the other is low. The result is easier to understand by looking at this table of aggregated totals:

The high anomaly comprises a three-month period from Jun 2020 to Aug 2020. Each aggregated total in this subset is extremely high. Similarly, the low anomaly comprises a period from Feb 2021 to Apr 2021 and each total in that subset is extremely low.

Here's how you might extract values from the result to produce narrative text:

[[

results = anomalyDetectorExtremeValues(Sales, Date, 'monthly', 30, 3)

highOnly = filter(results, x -> x.anomalyClass == 'high')[0:1]

lowOnly = filter(results, x -> x.anomalyClass == 'low')[0:1]

highStartDate = map(highOnly, x -> "[[formatDateTime(x.startDate, 'MMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.startValue))]])")

highEndDate = map(highOnly, x -> "[[formatDateTime(x.endDate, 'MMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.endValue))]])")

lowStartDate = map(lowOnly, x -> "[[formatDateTime(x.startDate, 'MMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.startValue))]])")

lowEndDate = map(lowOnly, x -> "[[formatDateTime(x.endDate, 'MMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.endValue))]])")

normalAvg = unique(map(results, x -> x.normalMean))

"Monthly sales were extremely high from [[highStartDate]] to [[highEndDate]] and extremely low from [[lowStartDate]] to [[lowEndDate]]. Excluding the anomalous periods, average monthly sales were [[abbreviateNumber(currencyFormat(normalAvg))]]."

]]

The output text is:

Monthly sales were extremely high from Jun 2020 ($26K) to Aug 2020 ($28K) and extremely low from Feb 2021 ($4K) to Apr 2021 ($3K). Excluding the anomalous periods, average monthly sales were $12.2K.

NOTE ON QUARTERLY GRANULARITY

If you ran a similar analysis but with the TIME GRANULARITY parameter set to quarterly, the function would return an empty list. You can try this in the downloadable project using this ATL: [[anomalyDetectorExtremeValues(Sales,Date,'quarterly',30,3)]].

It's easier to understand the result (no anomalies) by looking at this table of aggregated totals:

Remember that the WINDOW SIZE parameter is set to 3. The total for Q3, 2020 (73,000) stands out as extremely high, but it's not detected as an anomaly because it's a single value. As per the ATL, function looks for sets of three or more extreme values.

Tip

To detect single values that are extremely high or extremely low, use anomalyDetectorContextual instead.

Example 2 — Daily granularity

Assume a "Describe the Table" project with this data:

	ID	Date	Day	Month	Quarter	Year	Sales
Row 1	1	2021-02-01	1	Feb	Q1	2021	9,000.00
Row 2	2	2021-02-02	2	Feb	Q1	2021	8,000.00
Row 3	3	2021-02-03	3	Feb	Q1	2021	10,000.00
Row 4	4	2021-02-04	4	Feb	Q1	2021	8,000.00
Row 5	5	2021-02-05	5	Feb	Q1	2021	7,000.00
Row 6	6	2021-02-06	6	Feb	Q1	2021	19,000.00
Row 7	7	2021-02-07	7	Feb	Q1	2021	22,000.00
Row 8	8	2021-02-08	8	Feb	Q1	2021	1,000.00
Row 9	9	2021-02-09	9	Feb	Q1	2021	2,000.00
Row 10	10	2021-02-10	10	Feb	Q1	2021	8,000.00
Row 11	11	2021-02-11	11	Feb	Q1	2021	10,000.00
Row 12	12	2021-02-12	12	Feb	Q1	2021	9,000.00
Row 13	13	2021-02-13	13	Feb	Q1	2021	8,000.00
Row 14	14	2021-02-14	14	Feb	Q1	2021	10,000.00

The table contains sales data for the first two weeks of Feb 2021. Typically, you would have sales data covering a longer period, but we've kept the dataset small to simplify the example. Use THIS DOWNLOADABLE PROJECT if you want to try the example yourself.

Suppose you want to look for periods (three days minimum) in which each daily sales total is an extreme value. Also, suppose an extreme value is one 30% higher or lower than usual, allowing for normal variation as measured by standard deviation.

The ATL [[anomalyDetectorExtremeValues(Sales, Date, 'daily', 30, 3)]] returns this result:

(

(startDate = Feb 6, 2021, endDate = Feb 9, 2021, startValue = 19,000, endValue = 2,000, anomalyRange = time series, anomalyCategory = collective, anomalyType = extreme, anomalyClass = mixed, maxValue = 22,000, maxValueDate = Feb 7, 2021, minValue = 1,000, minValueDate = Feb 8, 2021, nearestNormalDate = Feb 5, 2021, nearestNormalValue = 7,000, normalMean = 8,700, normalStdDev = 1,059.35)

)

The results list contains a single ATL object. The anomalyClass value is mixed, which means the subset contains a mix of extremely high values and extremely low values. The result is easier to understand by looking at this table of aggregated totals:

The anomaly comprises a four-day period from Feb 6, 2021 to Feb 9, 2021. As you can see, there are two extremely high values and two extremely low values. Note that the highest and lowest values (22,000 and 1,000) are provided in the result object.