anomalyDetectorContextual
Detects values that are anomalous within a seasonal context using the selected time granularity. You might use this function to detect unusually high sales for June or unusually low total profit for Q4.
The result is a list of ATL objects, one for each detected anomaly. The object provides the anomaly value, the date it occurred, and some subsidiary information that may be useful — see Result object fields for a full breakdown.
The function detects two types of contextual anomaly: scale and directional.
A scale anomaly is when a value is unusually high or low compared to other values in the same group.
A directional anomaly is when a value’s directional change is different from that of all other values in the same group. For example, if Q3 sales are up from Q2 sales in 2021 but are down in all other years, the Q3, 2021 value is directionally anomalous.
The function returns an empty list if there are no anomalies.
Parameters
SERIES (list)
The input series of numbers. This must be a list or list-like object (e.g. a column or row) containing numbers.
DATES (list)
A list of dates, one for each number in the input series. This must be a list or list-like object containing datetime objects, or strings in the 'yyyy-MM-dd' pattern.
TIME GRANULARITY (string)
The granularity at which the data is compared. The options are daily, monthly, or quarterly.
THRESHOLD PERCENTAGE (number)
The percentage above standard deviation distance from the mean that a value must be to count as a scale anomaly. For example, a value of 20 sets the threshold to 120% of the standard variation, which means a value must be that amount above or below the mean (for that value's comparison group) to be anomalous. The parameter has no bearing on directional anomalies.
AGGREGATION FUNCTION (function)
Optional. The function used to aggregate values within the same time granularity period. The options are sum(), mean(), maxVal(), minVal(), or count(). Sum aggregation applies by default.
Default: sum()
Notes
The function removes null values from SERIES (and corresponding values in DATES) before performing its analysis.
The function removes null values from DATES (and corresponding values in SERIES) before performing its analysis.
If SERIES and DATES are of unequal length, the function truncates the longer list to ensure lists of equal length.
The input to THRESHOLD PERCENTAGE must be a non-negative number.
Result object fields
The function returns an ATL object for each detected anomaly. The object fields are:
Field | Description |
---|---|
date | The date for the anomalous value. |
value | The anomalous value. |
anomalyCategory | This is always contextual for anomalies detected by this function. |
anomalyRange | This is always time series for anomalies detected by this function. |
anomalyType | Indicates whether the anomaly type is scale or directional. |
anomalyClass | Indicates whether the anomaly is high or low (scale) or up or down (directional). |
previousDate | The date for the previous value (see below). |
previousValue | The value previous to the anomalous value. |
percentageChange * | The percentage change from the previous value to the anomalous value. |
nearestNormalDate † | The date for the nearest normal value. |
nearestNormalValue † | The value that is nearest in value to the anomaly but is not itself anomalous. |
normalChangesMean | The mean of all changes in the comparison group excluding the anomalous one. |
normalChangesStdDev | The standard deviation for all changes in the comparison group excluding the anomalous one. |
normalMean † | The mean of all values in the comparison group excluding the anomaly. |
normalStdDev † | The standard deviation for all values in the comparison group excluding the anomaly. |
normalPercentageChangesMean ‡ | The mean of all percentage changes in the comparison group excluding the anomalous one. |
* When the divisor value = 0, the returned value for this field is NaN (Not a Number).
† This field appears in the result object for scale anomalies only.
‡ This field appears in the result object for directional anomalies only.
Important
The function aggregates and compares data at the selected level of time granularity, so the value for fields such as value, previousValue, and nearestNormalValue is an aggregated value. See the examples for further guidance.
Examples
Example 1 — Monthly granularity (scale anomalies)
This example uses a dataset that is far too large to display fully here. Instead, we've uploaded the data into THIS DOWNLOADABLE PROJECT. The project contains no script content, so you can copy-paste in the example ATL and try it yourself.
The full dataset contains sales data for 2018 through 2021. For reference, the first six rows are:
ID | Date | Day | Month | Quarter | Year | Sales | |
---|---|---|---|---|---|---|---|
Row 1 | 1 | 2018-01-02 | 2 | Jan | Q1 | 2018 | 2,000.00 |
Row 2 | 2 | 2018-01-11 | 11 | Jan | Q1 | 2018 | 3,000.00 |
Row 3 | 3 | 2018-01-23 | 23 | Jan | Q1 | 2018 | 3,000.00 |
Row 4 | 4 | 2018-02-04 | 4 | Feb | Q1 | 2018 | 6,000.00 |
Row 5 | 5 | 2018-02-08 | 8 | Feb | Q1 | 2018 | 11,000.00 |
Row 6 | 6 | 2018-02-17 | 17 | Feb | Q1 | 2018 | 8,000.00 |
Suppose you want to look for scale anomalies by comparing the sales data for equivalent months. For example, you might want to know if the sales total for June 2021 is unusually high or low compared to June sales in other years. Also, by unusually high or low, you mean values that are 40% higher or lower than usual, allowing for normal variation (as measured by standard deviation).
The ATL is: [[anomalyDetectorContextual(Sales, Date, 'monthly', 40)]]
and the result is.
|
The results list contains five ATL objects. There are three scale anomalies and two directional anomalies. For this example, we want to focus on the scale anomalies. You can filter by anomalyType to leave the scale anomalies only:
[[filter(anomalyDetectorContextual(Sales, Date, 'monthly', 40), x -> x.anomalyType == 'scale')]]
The filtered result is:
|
There are three scale anomalies: one high and two low. The anomalous values are the aggregated totals for Mar 2019, Jun 2020, and Dec 2020. Note that the date value in each object is the date for the first value during these periods.
Note also that the detected values do not represent the highest or lowest aggregated totals in the dataset. They are anomalies because they are unusually high or low for a specific month. This is easier to see by looking at this table of aggregated totals:
The aggregated total for Mar 2019 (70,000) is not the highest overall but it's unusually high compared to the equivalent totals for 2018, 2020, and 2021. Similarly, the values for Jun 2020 and Dec 2020 are anomalous because they are unusually low for their group.
Here's how you might extract values from the result to produce narrative text:
[[ results = anomalyDetectorContextual(Sales, Date, 'monthly', 40) scaleAnomalies = filter(results, x -> x.anomalyType == 'scale') highOnly = filter(scaleAnomalies, x -> x.anomalyClass == 'high') lowOnly = filter(scaleAnomalies, x -> x.anomalyClass == 'low') highValues = map(highOnly, x -> "[[formatDateTime(x.date,'MMMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.value))]])") lowValues = map(lowOnly, x -> "[[formatDateTime(x.date,'MMMM yyyy')]] ([[abbreviateNumber(currencyFormat(x.value))]])") yearRange = joinStrings('the years ', unique(Year)[ :1], ' through ', unique(Year)[-1: ]) "Comparing equivalent months for [[yearRange]], total sales were unusually high in [[highValues]] and unusually low in [[lowValues]]." ]]
The output text is:
Comparing equivalent months for the years 2018 through 2021, total sales were unusually high in March 2019 ($70K) and unusually low in June 2020 ($10K) and December 2020 ($13K).
Tip
If you have downloaded the project with the example data, try running a similar analysis but with the TIME GRANULARITY parameter set to quarterly rather than monthly. Any anomalous value detected will be an aggregated total for a specific quarter (e.g. Q1, 2019) rather than a specific month (e.g. Mar 2019).
Example 2 — Monthly granularity (directional anomalies)
In Example 1, the function initially found a mixture of scale and directional anomalies. For simplicity, we filtered the result and focused on the scale anomalies only. In this example, we'll take a closer look at the directional anomalies.
For this example, assume a "Describe the Table" project with the same data used for Example 1.
[[filter(anomalyDetectorContextual(Sales, Date, 'monthly', 40), x -> x.anomalyType == 'directional')]]
returns this result:
|
There are two directional anomalies: one in Jun 2020 and the other in Dec 2020. With directional anomalies, the anomalyClass value is either up or down. For both anomalies here, the value is down.
It's easier to understand these anomalies by looking at this table of aggregated totals:
The Jun 2020 total (10,000) is anomalous because it's down from the preceding month's total (28,000), which makes it unique for its group. Note that the May-to-Jun direction is up for all other years in the dataset. The Dec 2020 total (13,000) is anomalous for the same reason: in all other years, the Nov-to-Dec direction is up rather than down.
Here's how you might extract values from the result to produce narrative text:
[[ results = anomalyDetectorContextual(Sales, Date, 'monthly', 40) dirOnly = filter(results, x -> x.anomalyType == 'directional') dirOnlyCount = len(dirOnly) dirDetails = map(dirOnly, x -> "[[formatDateTime(x.date,'MMMM yyyy')]] (sales [[x.anomalyClass]])") yearRange = joinStrings(unique(Year)[ :1], ' through ', unique(Year)[-1: ]) "The analysis compared equivalent months for [[yearRange]] and found [[numToWords(dirOnlyCount)]] directional [[inflectNoun('anomaly',dirOnlyCount)]]. [[if(len(dirOnly) > 0){The [[inflectNoun('anomaly',dirOnlyCount)]] occurred in [[dirDetails]]}]]." ]]
The output text is:
The analysis compared equivalent months for 2018 through 2021 and found two directional anomalies. The anomalies occurred in June 2020 (sales down) and December 2020 (sales down).
Example 3 — Daily granularity
Assume a "Describe the Table" project with this data:
ID | Date | Day | Month | Quarter | Year | Sales | |
---|---|---|---|---|---|---|---|
Row 1 | 1 | 2021-02-01 | 1 | Feb | Q1 | 2021 | 6,000.00 |
Row 2 | 2 | 2021-02-02 | 2 | Feb | Q1 | 2021 | 9,000.00 |
Row 3 | 3 | 2021-02-03 | 3 | Feb | Q1 | 2021 | 9,000.00 |
Row 4 | 4 | 2021-02-04 | 4 | Feb | Q1 | 2021 | 18,000.00 |
Row 5 | 5 | 2021-02-08 | 8 | Feb | Q1 | 2021 | 12,000.00 |
Row 6 | 6 | 2021-02-09 | 9 | Feb | Q1 | 2021 | 12,000.00 |
Row 7 | 7 | 2021-02-10 | 10 | Feb | Q1 | 2021 | 16,000.00 |
Row 8 | 8 | 2021-02-11 | 11 | Feb | Q1 | 2021 | 24,000.00 |
Row 9 | 9 | 2021-02-15 | 15 | Feb | Q1 | 2021 | 7,000.00 |
Row 10 | 10 | 2021-02-16 | 16 | Feb | Q1 | 2021 | 12,000.00 |
Row 11 | 11 | 2021-02-17 | 17 | Feb | Q1 | 2021 | 13,000.00 |
Row 12 | 12 | 2021-02-18 | 18 | Feb | Q1 | 2021 | 22,000.00 |
Row 13 | 13 | 2021-02-22 | 22 | Feb | Q1 | 2021 | 7,000.00 |
Row 14 | 14 | 2021-02-23 | 23 | Feb | Q1 | 2021 | 9,000.00 |
Row 15 | 15 | 2021-02-24 | 24 | Feb | Q1 | 2021 | 9,000.00 |
Row 16 | 16 | 2021-02-25 | 25 | Feb | Q1 | 2021 | 10,000.00 |
The table contains sales data for Feb 2021 only. Typically, you would have sales data covering a longer period, but we've kept the dataset small to simplify the example. Use THIS DOWNLOADABLE PROJECT if you want to try the example yourself.
Suppose that you want to know if sales on a particular Monday performed significantly better or worse than on other Mondays — and also run the same analysis for all other weekdays. Also, suppose that significantly better or worse means values that are 30% higher or lower than usual, allowing for normal variation (as measured by standard deviation).
The ATL is: [[anomalyDetectorContextual(Sales, Date, 'daily', 30)]]
and the result is:
|
There are two scale anomalies, one high and one low. Note that the detected values are not the highest and lowest in the Sales column. They are anomalies because they are unusually high/low for a specific day of the week.
This is easier to see by looking at this table, which shows the breakdown for each week by day:
The high scale anomaly occurred on Feb 8, 2021, which is a Monday. The anomalous value (12,000) is not the highest aggregated value in the table, but it's unusually high when compared to the aggregated totals for the other Mondays (Feb 1, Feb 15, and Feb 22).
Similarly, the low scale anomaly occurred on Feb 25, 2021, which is a Thursday. The anomalous value (10,000) is not the lowest aggregated value in the table, but it's unusually low when compared to the other aggregated totals in its comparison group.
Here's how you might extract values from the result to produce narrative text:
[[ global.results = anomalyDetectorContextual(Sales, Date, 'daily', 30) global.scaleOnly = filter(results, x -> x.anomalyType == 'scale') global.highScale = filter(scaleOnly, x -> x.anomalyClass == 'high') global.lowScale = filter(scaleOnly, x -> x.anomalyClass == 'low');'' ]] [[map(highScale, x -> "sales on [[x.date]] ([[abbreviateNumber(currencyFormat(x.value))]]) were unusually high for a [[formatDateTime(x.date,'EEEE')]] (normal average = [[abbreviateNumber(currencyFormat(x.normalMean))]])")]]. [[map(lowScale, x -> "sales on [[x.date]] ([[abbreviateNumber(currencyFormat(x.value))]]) were unusually low for a [[formatDateTime(x.date,'EEEE')]] (normal average = [[abbreviateNumber(currencyFormat(x.normalMean))]])")]].
The output text is:
Sales on Feb 8, 2021 ($12K) were unusually high for a Monday (normal average = $6.7K).
Sales on Feb 25, 2021 ($10K) were unusually low for a Thursday (normal average = $21.3K).
Note
All examples above use sum aggregation, which applies by default. Remember that you can pick a different aggregation type — e.g. mean() or count() — using the optional parameter.