'Existence Ratio Drift' verses configure Mona to find segments, in which a KAPI's existence ratio (the rate of contexts it appears in divided by the total amount of contexts in the segment) differs significantly between a target data set and a benchmark data set.
In the standard approach we check if the difference between the existence ratio in the target set to the existence ratio in the benchmark set is larger than a given multiplier * the standard deviation of the existence values distribution (1 if the metric appears in the context, and 0 if it doesn't) in the united benchmark and target
sets.
<ins>Advanced section:</ins>
No-STD-mode: This approach can be activated above a certain size, because the STD-aware approach biases against large segments.
In the no-STD-mode, we measure the difference between target and benchmark as absolute percentage of, regardless of the existence values variance. Since values may have different reference points other than -1, other values are also allowed to be considered as a reference points to measure both target and benchmark existence ratios difference and evaluate the percent change of the target diff from the benchmark's diff. If more than one reference is over the threshold of min percentage, the value that accounts for the largest change is chosen. The anomaly_level is the change in fraction of the benchmark's diff of the chosen reference value.
For example, if the reference values are 0 and 1, and we get existence ratios in benchmark of 0.4 and in target 0.1, the min anomaly level required to create a signal is 0.5 (because the diffs from 1 are 0.6 and 0.9 respectively, which is 50% change). If only 0 is supplied as reference, the required min anomaly level will be 0.75.
In this example we see an ExistenceRatioDrift verse which is configured to search for statistically significant decreases (not increases due to the "trend_directions" param) in the existence_ratio of "confidence_score" and
"failed_classification" in any specific "company_id" or "country" (or any intersection of country and company), between a "target" dataset from the last 7 days, and "benchmark" dataset from the 28 days prior to that.
We use "min_anomaly_level" to define that a drift occurs when the change in existence ratios between the benchmark and target sets is at least 0.5 standard deviations of the existence values distribution.
The "min_segment_size" param will filter out segments that are smaller than 5% of the data.
Lastly, "time_resolution" will configure the resolution of the time series chart in the insight's page for insights created by this configuration.
Basic Params
see morecadence
Name
Description
Type
Default
cadence
The cadence for evaluation of this verse. Only the following cadences are valid: Minutes: 1m, 5m, 10m, 15m, 20m, 30m. Hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h. Days: 1d, 2d, 3d, 4d, 5d, 6d. Weeks: 1w, 2w, 3w, 4w, 5w.
Cadence
1d
{ "cadence": "6h" }
default_urgency
Name
Description
Type
Default
default_urgency
The urgency class for insights created using this verse. Currently, supports two values: "normal" (default) and "high". If set to "normal", then specific thresholds for "high" urgency can be set using other parameters prefixed with "highurgency". If set to "high", then threshold parameters prefixed with "highurgency" are not considered at all - since all insights of this verse will be considered as having a "high" urgency.
Urgency
normal
{ "default_urgency": "high" }
description
Name
Description
Type
Default
description
Verse description.
str
{ "description": "searches for asc drifts in output_score" }
metrics
Name
Description
Type
Default
metrics
Relevant metrics to search anomalies for in the verse, relevant only for types who search for anomalies in metrics behavior.
This parameter sets the threshold for the minimal anomaly level for which an insight will be generated. Anomaly Level of this verse is the difference between the last point's value to the median value, normalized by the difference between the top_percentile_benchmark and the median.
PositiveFloat
2.5
{ "min_anomaly_level": 2 }
min_segment_size
Name
Description
Type
Default
min_segment_size
Minimal segment size to require for the entire time-series.
PositiveInt
5
{ "min_segment_size": 10 }
min_segment_size_fraction
Name
Description
Type
Default
min_segment_size_fraction
Minimal segment size in fraction from baseline segment, which a segment must have in order to be considered in the search.
InclusiveFraction
0
{ "min_segment_size_fraction": 0.05 }
name
Name
Description
Type
Default
name
(Required) The name of the verse. Please note, a verse's name must be different from other verses in the same stanza.
str
None
{ "name": "confidence_outliers" }
segment_by
Name
Description
Type
Default
segment_by
The dimensions to use to segment the data in order to search for anomalies. This list must be a sublist of all arc class' dimensions. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object, as seen in the example.
Time series time resolution period. Expected format is "" where can be any positive integer, and options currently include: "d" (days), or "w" (weeks). e.g, "1d" means 1 day period
TimeResolution
1d
{ "time_resolution": "1w"" }
time_series_points
Name
Description
Type
Default
time_series_points
Size of desired entire time series.
PositiveInt
60
{ "time_series_points": 30 }
trend_directions
Name
Description
Type
Default
trend_directions
A list of allowed anomalies trends directions - either 'asc' for ascending (anomalies in which the found value is LARGER THAN the relevant benchmark), or 'desc' for descending (anomalies in which the found value is SMALLER THAN the relevant benchmark).
TrendDirections
('asc', 'desc')
{ "trend_directions": [ "asc" ] }
Advanced Misc Params
see moreavoid_same_field_for_segment_and_metric
Name
Description
Type
Default
avoid_same_field_for_segment_and_metric
If True, insights would not be created for segments based on the same field as the given metric.
Instructions on how to read an insight generated by this verse. Expected format is MarkDown.
Cookbook
{ "cookbook": "Use **this param** to add instructions using [markdown](https://daringfireball.net/projects/markdown/syntax) syntax on how to read insights generated from this `verse`, and what should the insight recipient do with it." }
create_extra_adjacent_signals
Name
Description
Type
Default
create_extra_adjacent_signals
If set to true (default), will cause Mona to create new signals from existing signals with adjacent numeric segments. So if there are two signals defined on 1 <= x < 2 and 2 <= x < 3 - Mona will automatically create a new signal with 1 <= x < 3. This will allow the later clustering algorithm to create an insight with the most relevant segment for its main signal.
bool
True
{ "create_extra_adjacent_signals": false }
disabled
Name
Description
Type
Default
disabled
If set to True - this verse won't be used when searching for new insights.
bool
False
{ "disabled": true }
expire_after
Name
Description
Type
Default
expire_after
Insights detected by this verse will continue to be considered active for at least this amount of time after the last time they were detected.
TimePeriodOrEmpty
3d
{ "expire_after": "2d" }
relevant_data_time_buffer
Name
Description
Type
Default
relevant_data_time_buffer
Adds an end-time buffer to the insight generation. For example - If this param's value is "1d", then insights are generated for a day before the latest received data. This is useful for processes in which it takes a specific period of time to get all the healthy monitoring data in place.
TimePeriodOrEmpty
{ "relevant_data_time_buffer": "1d" }
timestamp_field_name
Name
Description
Type
Default
timestamp_field_name
The field that is used as the time dimension for insight generation.
An exponent to put on the anomaly level in the score after multiplying it by the given multiplier.
float
1
{ "score_anomaly_level_exponent": 0.5 }
score_anomaly_level_multiplier
Name
Description
Type
Default
score_anomaly_level_multiplier
Multiplier for an anomaly level to use before using the exponent.
float
1
{ "score_anomaly_level_multiplier": 1.2 }
score_segment_size_exponent
Name
Description
Type
Default
score_segment_size_exponent
An exponent to put on the segment's size (or relative size) in the combined score. If score_segment_size_log_base is not 0, the exponent will be applied before the logarithm will.
float
0.5
{ "score_segment_size_exponent": 1.5 }
score_segment_size_log_base
Name
Description
Type
Default
score_segment_size_log_base
Changes the log base used for the segment's size (or relative size) in the combined score, or remove the log altogether by setting 0 here. Unless its 0 this value must be larger than 1.
float
0
{ "score_segment_size_log_base": 5 }
score_use_segment_absolute_size
Name
Description
Type
Default
score_use_segment_absolute_size
If true, use the segment absolute size in the combined score, otherwise use the segment's size relative to its baseline (fraction).
bool
True
{ "score_use_segment_absolute_size": false }
top_percentile_benchmark
Name
Description
Type
Default
top_percentile_benchmark
Defines the top/bottom percentile to search for in the time-series (after the last point is removed) to serve as a benchmark for what will be defined as a large difference from the median (to that top/bottom direction). For verses with 'desc' in trend_directions the bottom percentile is used as is. For verses with 'asc' in trend_directions the top percentile is (100 - top_percentile_benchmark). For example, if the top fraction percentile is 5 then for bottom threshold 5th percentile is used as benchmark, whereas for top threshold the 95th percentile is used as benchmark.
PositiveFloat
5
{ "top_percentile_benchmark": 10 }
Anomaly Thresholds Params
see moreepsilon
Name
Description
Type
Default
epsilon
Minimal required absolute difference between value and benchmark. Used to account for statistical errors in stable time series.
NonNegativeFloat
0.01
{ "epsilon": 0.5 }
high_urgency_min_anomaly_level
Name
Description
Type
Default
high_urgency_min_anomaly_level
Threshold for separating between "high" and "normal" urgency insights with regards to min_anomaly_level. See "min_anomaly_level" param for more details on the functionality of this param.
PositiveFloatOrNone
None
{ "high_urgency_min_anomaly_level": 1.5 }
high_urgency_min_score
Name
Description
Type
Default
high_urgency_min_score
Threshold for separating between "high" and "normal" urgency insights with regards to min_score. See "min_score" param for more details on the functionality of this param.
FloatOrNone
None
{ "high_urgency_min_score": 20 }
min_anomaly_level
Name
Description
Type
Default
min_anomaly_level
This parameter sets the threshold for the minimal anomaly level for which an insight will be generated. Anomaly Level of this verse is the difference between the last point's value to the median value, normalized by the difference between the top_percentile_benchmark and the median.
PositiveFloat
2.5
{ "min_anomaly_level": 2 }
min_score
Name
Description
Type
Default
min_score
The minimal score for a signal to be considered as an anomaly.
float
0
{ "min_score": 4 }
Data Filtering Params
see moreavoid_segmenting_on_missing
Name
Description
Type
Default
avoid_segmenting_on_missing
When true, insights will not be generated for segments which are (partially or fully) defined by a missing field.
bool
False
{ "avoid_segmenting_on_missing": true }
baseline_segment
Name
Description
Type
Default
baseline_segment
The baseline segment of this verse. This segment defines "the world" as far as this verse is concerned. Only data from this segment will be considered when finding insights.
If True, when exclude segments are added to any level of configuration (either in the verse, the stanza or the stanzas_global_defaults) they are ADDED to the excluded segments of higher level defaults, if exists any. For example, if we have in stanzas_global_default a single excluded segment of {dimensionA: MISSING}, and the stanza (or verse) has a single excluded segment of {dimensionB: 0}, then if enhance_exclude_segments is True, the excluded segments will include both {dimensionA: MISSING} and {dimensionB: 0} and will filter either one. Otherwise (if enhance_exclude_segments is False) it will be overridden to just the one segment in the verse {dimensionB: 0}.
bool
False
{ "enhance_exclude_segments": true }
exclude_segments
Name
Description
Type
Default
exclude_segments
Segments to exclude in the baseline of this verse. Each data we search for will not include these segments - both tested segments as well as any benchmarks used to find the anomalies. Notice that whether or not this param will override definitions for exclude_segments in other levels is decided by enhance_exclude_segments.
A list of fields to avoid checking for correlated anomalies to the main anomaly in a generated insight. See "find_related_anomalies_for" for further details.
A list of fields to check for correlated anomalies to the main anomaly in a generated insight. These correlated anomalies might help with understanding the possible cause of an insight. Leave empty to search in all fields.
Minimal Pearson correlation between the metric on which an anomaly was found and another metric with an anomaly on the same segment, below which Mona will not use the other metric as a related anomaly.
NonNegativeFloat
0.3
{ "related_anomalies_min_correlation": 0.5 }
Required Params
see morename
Name
Description
Type
Default
name
(Required) The name of the verse. Please note, a verse's name must be different from other verses in the same stanza.
str
None
{ "name": "confidence_outliers" }
Segmentation Params
see morealways_segment_baseline_by
Name
Description
Type
Default
always_segment_baseline_by
A list of dimensions to always segment the baseline segment by. This is useful when separating the world to completely unrelated parts - e.g., in a case where you have a different model developed for each customer and there's no need to look for insights across different customers. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object, as seen in the example.
When true, insights will not be generated for segments which are (partially or fully) defined by a missing field.
bool
False
{ "avoid_segmenting_on_missing": true }
max_segment_baseline_by_depth
Name
Description
Type
Default
max_segment_baseline_by_depth
The maximum number of fields Mona should combine for segmenting the baseline (if "segment_baseline_by" fields given).
PositiveInt
2
{ "max_segment_baseline_by_depth": 3 }
max_segment_by_depth
Name
Description
Type
Default
max_segment_by_depth
The maximum number of fields Mona should combine to create sub-segments to search in. Baseline segment fields and parent fields are "free", and are not counted for depth. Notice this parameter has a exponential effect on the performance and should be kept within SLAs.
PositiveInt
2
{ "max_segment_by_depth": 3 }
min_segment_baseline_by_depth
Name
Description
Type
Default
min_segment_baseline_by_depth
The minimum number of fields Mona should combine for segmenting the baseline (if "segment_baseline_by" fields given).
NonNegativeInt
0
{ "min_segment_baseline_by_depth": 1 }
min_segment_by_depth
Name
Description
Type
Default
min_segment_by_depth
The minimum number of fields Mona should combine to create sub-segments to search in.
NonNegativeInt
0
{ "min_segment_by_depth": 1 }
segment_baseline_by
Name
Description
Type
Default
segment_baseline_by
A list of dimensions to potentially segment the baseline segment by. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object.
SegmentationsList
()
{ "segment_baseline_by": [ "model_version" ] }
segment_by
Name
Description
Type
Default
segment_by
The dimensions to use to segment the data in order to search for anomalies. This list must be a sublist of all arc class' dimensions. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object, as seen in the example.
Threshold for separating between "high" and "normal" urgency insights with regards to baseline_min_segment_size. See "baseline_min_segment_size" param for more details on the functionality of this param.
Threshold for separating between "high" and "normal" urgency insights with regards to min_culprit_size. See "min_culprit_size" param for more details on the functionality of this param.
NonNegativeFloatOrNone
None
{ "high_urgency_min_culprit_size": 500 }
high_urgency_min_segment_size
Name
Description
Type
Default
high_urgency_min_segment_size
Threshold for separating between "high" and "normal" urgency insights with regards to min_segment_size. See "min_segment_size" param for more details on the functionality of this param.
PositiveIntOrNone
None
{ "high_urgency_min_segment_size": 1000 }
high_urgency_min_segment_size_fraction
Name
Description
Type
Default
high_urgency_min_segment_size_fraction
Threshold for separating between "high" and "normal" urgency insights with regards to min_segment_size_fraction. See "min_segment_size_fraction" param for more details on the functionality of this param.
InclusiveFractionOrNone
None
{ "high_urgency_min_segment_size_fraction": 0.2 }
max_segment_size
Name
Description
Type
Default
max_segment_size
Maximal segment size which a segment must have (bigger segments won't be considered in the search). Leave empty to not have such a threshold.
PositiveIntOrNone
None
{ "max_segment_size": 10000 }
max_segment_size_fraction
Name
Description
Type
Default
max_segment_size_fraction
Maximal segment size in fraction from baseline segment, which a segment must have. Leave empty to not have such a threshold.
NonInclusiveFractionOrNone
None
{ "max_segment_size_fraction": 0.2 }
min_culprit_size
Name
Description
Type
Default
min_culprit_size
Minimal absolute size (number of relevant contexts) of the checked point (latest).
NonNegativeFloat
0
{ "min_culprit_size": 50 }
min_exist_freq
Name
Description
Type
Default
min_exist_freq
The minimum fraction of the timeseries frames in which the segment had any data. Why do we need this? Verses might rely on data that only exists in a few days (or any other time resolution). These cases are usually much less stable and more noisy. Usually these cases are not interesting and should be filtered (with this param). Cases who expet that behavior should reduce this value significantly.
InclusiveFraction
0
{ "min_exist_freq": 0.5 }
min_point_size
Name
Description
Type
Default
min_point_size
The minimal absolute size (number of relevant contexts) of all the timeseries points except the checked point. Points under this threshold will be ignored when calculating the percentiles of the time series.
NonNegativeInt
0
{ "min_point_size": 20 }
min_segment_size
Name
Description
Type
Default
min_segment_size
Minimal segment size to require for the entire time-series.
PositiveInt
5
{ "min_segment_size": 10 }
min_segment_size_fraction
Name
Description
Type
Default
min_segment_size_fraction
Minimal segment size in fraction from baseline segment, which a segment must have in order to be considered in the search.
InclusiveFraction
0
{ "min_segment_size_fraction": 0.05 }
Time Related Params
see morecadence
Name
Description
Type
Default
cadence
The cadence for evaluation of this verse. Only the following cadences are valid: Minutes: 1m, 5m, 10m, 15m, 20m, 30m. Hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h. Days: 1d, 2d, 3d, 4d, 5d, 6d. Weeks: 1w, 2w, 3w, 4w, 5w.
Cadence
1d
{ "cadence": "6h" }
expire_after
Name
Description
Type
Default
expire_after
Insights detected by this verse will continue to be considered active for at least this amount of time after the last time they were detected.
TimePeriodOrEmpty
3d
{ "expire_after": "2d" }
relevant_data_time_buffer
Name
Description
Type
Default
relevant_data_time_buffer
Adds an end-time buffer to the insight generation. For example - If this param's value is "1d", then insights are generated for a day before the latest received data. This is useful for processes in which it takes a specific period of time to get all the healthy monitoring data in place.
TimePeriodOrEmpty
{ "relevant_data_time_buffer": "1d" }
time_resolution
Name
Description
Type
Default
time_resolution
Time series time resolution period. Expected format is "" where can be any positive integer, and options currently include: "d" (days), or "w" (weeks). e.g, "1d" means 1 day period
TimeResolution
1d
{ "time_resolution": "1w"" }
timestamp_field_name
Name
Description
Type
Default
timestamp_field_name
The field that is used as the time dimension for insight generation.
The urgency class for insights created using this verse. Currently, supports two values: "normal" (default) and "high". If set to "normal", then specific thresholds for "high" urgency can be set using other parameters prefixed with "highurgency". If set to "high", then threshold parameters prefixed with "highurgency" are not considered at all - since all insights of this verse will be considered as having a "high" urgency.
Urgency
normal
{ "default_urgency": "high" }
high_urgency_baseline_min_segment_size
Name
Description
Type
Default
high_urgency_baseline_min_segment_size
Threshold for separating between "high" and "normal" urgency insights with regards to baseline_min_segment_size. See "baseline_min_segment_size" param for more details on the functionality of this param.
Threshold for separating between "high" and "normal" urgency insights with regards to min_anomaly_level. See "min_anomaly_level" param for more details on the functionality of this param.
PositiveFloatOrNone
None
{ "high_urgency_min_anomaly_level": 1.5 }
high_urgency_min_culprit_size
Name
Description
Type
Default
high_urgency_min_culprit_size
Threshold for separating between "high" and "normal" urgency insights with regards to min_culprit_size. See "min_culprit_size" param for more details on the functionality of this param.
NonNegativeFloatOrNone
None
{ "high_urgency_min_culprit_size": 500 }
high_urgency_min_score
Name
Description
Type
Default
high_urgency_min_score
Threshold for separating between "high" and "normal" urgency insights with regards to min_score. See "min_score" param for more details on the functionality of this param.
FloatOrNone
None
{ "high_urgency_min_score": 20 }
high_urgency_min_segment_size
Name
Description
Type
Default
high_urgency_min_segment_size
Threshold for separating between "high" and "normal" urgency insights with regards to min_segment_size. See "min_segment_size" param for more details on the functionality of this param.
PositiveIntOrNone
None
{ "high_urgency_min_segment_size": 1000 }
high_urgency_min_segment_size_fraction
Name
Description
Type
Default
high_urgency_min_segment_size_fraction
Threshold for separating between "high" and "normal" urgency insights with regards to min_segment_size_fraction. See "min_segment_size_fraction" param for more details on the functionality of this param.
InclusiveFractionOrNone
None
{ "high_urgency_min_segment_size_fraction": 0.2 }
high_urgency_require_all_criteria
Name
Description
Type
Default
high_urgency_require_all_criteria
Decide if to use 'AND'/'OR' condition between all high_urgency threshold params.
bool
True
{ "high_urgency_require_all_criteria": false }
Visuals and Enrichments Params
see morefield_vectors
Name
Description
Type
Default
field_vectors
This attribute lists metric vectors for the FE to show on an insight card of this verse. A value in this field can either be a string (in which case the string should correspond to a kapi_vector name in the config) or an array (in which case the array should be treated as an ad-hoc kapi vector defined specifically for this verse).
Dictates the link to the investigations page to add to the found insights. If True, the link will point to investigations page with a drilldown to the segment that was found. If it's false the link will point to the investigations page without drilldown, but with the found segment selected so it can be compared with a benchmark of a higher level.
bool
False
{ "investigate_no_drill": true }
time_resolution
Name
Description
Type
Default
time_resolution
Time series time resolution period. Expected format is "" where can be any positive integer, and options currently include: "d" (days), or "w" (weeks). e.g, "1d" means 1 day period
TimeResolution
1d
{ "time_resolution": "1w"" }
Wizard Params
see morecadence
Name
Description
Type
Default
cadence
The cadence for evaluation of this verse. Only the following cadences are valid: Minutes: 1m, 5m, 10m, 15m, 20m, 30m. Hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h. Days: 1d, 2d, 3d, 4d, 5d, 6d. Weeks: 1w, 2w, 3w, 4w, 5w.
Cadence
1d
{ "cadence": "6h" }
default_urgency
Name
Description
Type
Default
default_urgency
The urgency class for insights created using this verse. Currently, supports two values: "normal" (default) and "high". If set to "normal", then specific thresholds for "high" urgency can be set using other parameters prefixed with "highurgency". If set to "high", then threshold parameters prefixed with "highurgency" are not considered at all - since all insights of this verse will be considered as having a "high" urgency.
Urgency
normal
{ "default_urgency": "high" }
metrics
Name
Description
Type
Default
metrics
Relevant metrics to search anomalies for in the verse, relevant only for types who search for anomalies in metrics behavior.
This parameter sets the threshold for the minimal anomaly level for which an insight will be generated. Anomaly Level of this verse is the difference between the last point's value to the median value, normalized by the difference between the top_percentile_benchmark and the median.
PositiveFloat
2.5
{ "min_anomaly_level": 2 }
min_segment_size
Name
Description
Type
Default
min_segment_size
Minimal segment size to require for the entire time-series.
PositiveInt
5
{ "min_segment_size": 10 }
min_segment_size_fraction
Name
Description
Type
Default
min_segment_size_fraction
Minimal segment size in fraction from baseline segment, which a segment must have in order to be considered in the search.
InclusiveFraction
0
{ "min_segment_size_fraction": 0.05 }
name
Name
Description
Type
Default
name
(Required) The name of the verse. Please note, a verse's name must be different from other verses in the same stanza.
str
None
{ "name": "confidence_outliers" }
segment_by
Name
Description
Type
Default
segment_by
The dimensions to use to segment the data in order to search for anomalies. This list must be a sublist of all arc class' dimensions. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object, as seen in the example.
Time series time resolution period. Expected format is "" where can be any positive integer, and options currently include: "d" (days), or "w" (weeks). e.g, "1d" means 1 day period
TimeResolution
1d
{ "time_resolution": "1w"" }
time_series_points
Name
Description
Type
Default
time_series_points
Size of desired entire time series.
PositiveInt
60
{ "time_series_points": 30 }
trend_directions
Name
Description
Type
Default
trend_directions
A list of allowed anomalies trends directions - either 'asc' for ascending (anomalies in which the found value is LARGER THAN the relevant benchmark), or 'desc' for descending (anomalies in which the found value is SMALLER THAN the relevant benchmark).