outlier Search Operator
Given a series of time-stamped numerical values, using the outlier
operator in a query can identify values in a sequence that seem unexpected, and would identify an alert or violation, for example, for a scheduled search.
To do this, the Outlier operator tracks the moving average and standard deviation of a numerical field. An outlier is identified based on a specified threshold of standard deviations around the expected value. If a data point is outside the threshold, it is considered to be an outlier.
Syntax
...
| timeslice <time_period>
| <aggregate operator> as <field> by _timeslice
| outlier <field> [window=<#>, threshold=<#>, consecutive=<#>, direction=<+->]
...
| timeslice <time_period>
| <aggregate operator> by _timeslice, <field>
| outlier <_aggregate> by <field> [window=<#>, threshold=<#>, consecutive=<#>, direction=<+->]
A timeslice
is required.
The second syntax example uses an additional “group by” clause to find outliers for multiple values of a field. See the example below for details.
The following table lists the fields returned in outlier results:
Field | Description |
---|---|
<field>_error | This is the <field> - mean. |
<field>_lower | This is the mean - threshold*standard deviation. |
<field>_upper | This is the mean + threshold*standard deviation. |
<field>_indicator | This is either 0 or 1. It is set to 1 for a data point outside of the lower and upper boundaries. Data observed further than the specified number of standard deviations from the rolling average is an outlier, known as an indicator. |
<field>_violation | This is either 0 or 1. It is set to 1 for hitting the specified number of consecutive indicators, known as an outlier. |
You can configure options by setting parameters through keyword arguments, such as window, threshold, consecutive, and direction.
Keyword Argument | Description |
---|---|
window | Sets the trailing number of data points to calculate mean and sigma. The default is 10. |
threshold | Sets the number of standard deviations for calculating violations. The default is 3.0. |
consecutive | Sets the required number of consecutive indicator data points (outliers) to trigger a violation. The default is 1. |
direction | Use +- , + , or - , to specify which direction should trigger violations:
|
For example, this query would set the following parameters:
... | outlier <field> window=5,threshold=3,consecutive=2,direction=+-
- window=5 : Use the trailing 5 data points to calculate mean and sigma.
- threshold=3 : Calculate violation based on +/- 3 standard deviations.
- consecutive=2 : Trigger a violation by returning
<field>_violation=1
in the search results only if 2 or more consecutive indicator data points occur. - direction=+- : Uses positive or negative deviations.
Rules
- The outlier operator must appear after a group by aggregator, such as
count
,min
,max
, orsum
. - The original target field must be numeric.
- A
timeslice
is required.
Limitations
- Because the most recent time bucket in a query may have incomplete data, it is ignored by outlier. Consequently, if an alert is set to trigger on
<field>_violation
changing to 1, this alert will trigger one timeslice later.
Examples
IIS logs
Run the following query to find outlier values in IIS logs over the last 6 hours.
_sourceCategory=IIS/Access
| parse regex "\d+-\d+-\d+ \d+:\d+:\d+ (?<server_ip>\S+) (?<method>\S+) (?<cs_uri_stem>/\S+?) \S+ \d+ (?<user>\S+) (?<client_ip>[\.\d]+) "
| parse regex "\d+ \d+ \d+ (?<response_time>\d+)$"
| timeslice 15m
| max(response_time) as response_time by _timeslice
| outlier response_time window=5,threshold=3,consecutive=2,direction=+-
The outlier values are represented by the pink triangles in the resulting chart.
Apache logs - Server Errors Over Time
Run the following query to find outlier values in Apache logs over the last 3 hours.
_sourceCategory=Apache/Access
| parse "HTTP/1.1\" * " as status_code
| where status_code matches "5*"
| timeslice 5m
| count(status_code) as status_code by _timeslice
| outlier status_code window=5,threshold=3,consecutive=1,direction=+-
The outlier values are represented by the pink triangles in the resulting chart.