Description

'Sum Drift' verses configure Mona to find segments, in which a metric's average periodic sum (average of the sums of all its values in each configurable period) differs significantly between a target data set and a benchmark data set.

In order to measure if such a signal for a given segment and metric should be created, we create a time series for both the target and the benchmark sets, with the sum of the metric in each time frame of both sets. We then calculate the standard deviation of the joined timeseries. Then we check if the average of the target sums timeseries differs from the average benchmark sums timeseries by MULTIPLIER * both-timeseries-STD at least.

Basic Params

see more
benchmark_set_period

Name

Description

Type

Default

benchmark_set_period

Time period for benchmark set. By default means the period just before the target set period. Expected format is "" where can be any positive integer, and options currently include: "d" (days), or "w" (weeks). e.g, "1d" means 1 day period.

TimePeriod

6w

{ "benchmark_set_period": "50d" }
benchmark_set_period_type

Name

Description

Type

Default

benchmark_set_period_type

Sets the end time of the benchmark set period. Supports 'previous_to_target' (benchmark ends when target starts) and "latest" (both ends on the same date).

BenchmarkSetPeriodType

previous_to_target

{ "benchmark_set_period_type": "latest" }
metrics

Name

Description

Type

Default

metrics

Relevant metrics to search anomalies for in the verse, relevant only for types who search for anomalies in metrics behavior.

None

[]

{ "metrics": [ "top_score", "delta_top_to_second_score" ] }
min_anomaly_level

Name

Description

Type

Default

min_anomaly_level

This parameter sets the threshold for the minimal anomaly level for which an insight will be generated. Anomaly Level in this verse is the difference between the sums timeseries average of the target period to the sums timeseries average of the benchmark period, normalized by the overall STD of the joined timeseries.

PositiveFloat

0.3

{ "min_anomaly_level": 1 }
min_segment_size

Name

Description

Type

Default

min_segment_size

Minimal segment size for the united benchmark+target segments.

PositiveInt

100

{ "min_segment_size": 100 }
min_segment_size_fraction

Name

Description

Type

Default

min_segment_size_fraction

Minimal segment size in fraction from baseline segment, which a segment must have in order to be considered in the search.

InclusiveFraction

0

{ "min_segment_size_fraction": 0.05 }
name

Name

Description

Type

Default

name

The name of the verse. If missing, this field defaults to "<stanza's name><verse's type>". Please note, a verse's name must be different from other verses in the same stanza. Therefore, in cases where there is more than one verse of the same type, using the default name ("<stanza's name><verse's type>") is not supported and custom names have to be provided for these verses.

None

None

{ "name": "confidence_outliers" }
segment_by

Name

Description

Type

Default

segment_by

The dimensions to use to segment the data in order to search for anomalies. This list must be a sublist of all arc class' dimensions. Avoiding insights generation on a specific value of a segmentation field can be done by using the "avoid_values" key in the segmentation json object, as seen in the example.

None

[]

{ "segment_by": [ "city", "bot_id", {"name": "provider-code", "avoid_values": ["zoom"]}, {"name": "selected-language", "avoid_values": ["eng", "spa"]} ] }
target_set_period

Name

Description

Type

Default

target_set_period

Time period for the target set, ending on the day of the latest available data. Format detailed in common/util.py's get_time_period_for_string.

TimePeriod

2w

{ "target_set_period": "1w" }
time_resolution

Name

Description

Type

Default

time_resolution

Time period to serve as time resolution to use when creating the time series for both the target and the benchmark-set, on which we measure the difference. Format detailed in common/util.py's get_time_period_for_string.

TimeResolution

1d

{ "time_resolution": "1w" }
trend_directions

Name

Description

Type

Default

trend_directions

A list of allowed anomalies trends directions - either 'asc' for ascending (anomalies in which the found value is LARGER THAN the relevant benchmark), or 'desc' for descending (anomalies in which the found value is SMALLER THAN the relevant benchmark).

None

['asc', 'desc']

{ "trend_directions": [ "asc" ] }

Advanced Misc Params

see more
avoid_same_field_for_segment_and_metric

Name

Description

Type

Default

avoid_same_field_for_segment_and_metric

If True, insights would not be created for segments based on the same field as the given metric.

bool

True

{ "avoid_same_field_for_segment_and_metric": false }
cadence

Name

Description

Type

Default

cadence

The cadence for evaluation of this verse. Only the following cadences are valid: Minutes: 1m, 5m, 10m, 15m, 20m, 30m. Hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h. Days: 1d, 2d, 3d, 4d, 5d, 6d. Weeks: 1w, 2w, 3w, 4w, 5w.

None

1d

{ "cadence": "6h" }
create_extra_adjacent_signals

Name

Description

Type

Default

create_extra_adjacent_signals

If set to true (default), will cause Mona to create new signals from existing signals with adjacent numeric segments. So if there are two signals defined on 1 <= x < 2 and 2 <= x < 3 - Mona will automatically create a new signal with 1 <= x < 3. This will allow the later clustering algorithm to create an insight with the most relevant segment for its main signal.

bool

True

{ "create_extra_adjacent_signals": false }
disabled

Name

Description

Type

Default

disabled

If set to True - this verse won't be used when searching for new insights.

bool

False

{ "disabled": true }
expire_after

Name

Description

Type

Default

expire_after

Insights detected by this verse will continue to be considered active for at least this amount of time after the last time they were detected.

TimePeriod

3d

{ "expire_after": "2d" }

Advanced Score Calculation Params

see more
score_anomaly_level_exponent

Name

Description

Type

Default

score_anomaly_level_exponent

An exponent to put on the anomaly level in the score after multiplying it by the given multiplier.

float

1.0

{ "score_anomaly_level_exponent": 0.5 }
score_anomaly_level_multiplier

Name

Description

Type

Default

score_anomaly_level_multiplier

Multiplier for an anomaly level to use before using the exponent.

float

1.0

{ "score_anomaly_level_multiplier": 1.2 }
score_segment_size_exponent

Name

Description

Type

Default

score_segment_size_exponent

An exponent to put on the segment's size (or relative size) in the combined score. If score_segment_size_log_base is not 0, the exponent will be applied before the logarithm will.

float

0.5

{ "score_segment_size_exponent": 1.5 }
score_segment_size_log_base

Name

Description

Type

Default

score_segment_size_log_base

Changes the log base used for the segment's size (or relative size) in the combined score, or remove the log altogether by setting 0 here. Unless its 0 this value must be larger than 1.

float

0.0

{ "score_segment_size_log_base": 5 }
score_use_segment_absolute_size

Name

Description

Type

Default

score_use_segment_absolute_size

If true, use the segment absolute size in the combined score, otherwise use the segment's size relative to its baseline (fraction).

bool

True

{ "score_use_segment_absolute_size": false }

Anomaly Thresholds Params

see more
epsilon

Name

Description

Type

Default

epsilon

Minimal absolute difference between benchmark and value.

NonNegativeFloat

0.01

{ "epsilon": 0.5 }
min_anomaly_level

Name

Description

Type

Default

min_anomaly_level

This parameter sets the threshold for the minimal anomaly level for which an insight will be generated. Anomaly Level in this verse is the difference between the sums timeseries average of the target period to the sums timeseries average of the benchmark period, normalized by the overall STD of the joined timeseries.

PositiveFloat

0.3

{ "min_anomaly_level": 1 }
min_score

Name

Description

Type

Default

min_score

The minimal score for a signal to be considered as an anomaly.

float

0.0

{ "min_score": 4 }

Data Filtering Params

see more
avoid_segmenting_on_missing

Name

Description

Type

Default

avoid_segmenting_on_missing

When true, insights will not be generated for segments which are (partially or fully) defined by a missing field.

bool

False

{ "avoid_segmenting_on_missing": true }
baseline_segment

Name

Description

Type

Default

baseline_segment

The baseline segment of this verse. This segment defines "the world" as far as this verse is concerned. Only data from this segment will be considered when finding insights.

None

{}

{ "baseline_segment": { "model_version": [ { "value": "V1" } ] } }
benchmark_baseline_segment

Name

Description

Type

Default

benchmark_baseline_segment

Benchmark baseline segment. This segment is intersected with any data we search for in the benchmark segments.

Segment

{}

{ "benchmark_baseline_segment": { "model_version": [ { "value": "V2" } ] } }
enhance_exclude_segments

Name

Description

Type

Default

enhance_exclude_segments

If True, when exclude segments are added to any level of configuration (either in the verse, the stanza or the stanzas_global_defaults) they are ADDED to the excluded segments of higher level defaults, if exists any. For example, if we have in stanzas_global_default a single excluded segment of {dimensionA: MISSING}, and the stanza (or verse) has a single excluded segment of {dimensionB: 0}, then if enhance_exclude_segments is True, the excluded segments will include both {dimensionA: MISSING} and {dimensionB: 0} and will filter either one. Otherwise (if enhance_exclude_segments is False) it will be overridden to just the one segment in the verse {dimensionB: 0}.

bool

False

{ "enhance_exclude_segments": true }
exclude_segments

Name

Description

Type

Default

exclude_segments

Segments to exclude in the baseline of this verse. Each data we search for will not include these segments - both tested segments as well as any benchmarks used to find the anomalies. Notice that whether or not this param will override definitions for exclude_segments in other levels is decided by enhance_exclude_segments.

None

[]

{ "exclude_segments": [ { "text_length": [ { "min_value": 0, "max_value": 100 } ] } ] }
target_baseline_segment

Name

Description

Type

Default

target_baseline_segment

Target baseline segment. This segment is intersected with any data we search for in the tested segments.

Segment

{}

{ "target_baseline_segment": { "model_version": [ { "value": "V1" } ] } }

Related Anomalies Params

see more
avoid_related_anomalies_for

Name

Description

Type

Default

avoid_related_anomalies_for

A list of fields to avoid checking for correlated anomalies to the main anomaly in a generated insight. See "find_related_anomalies_for" for further details.

None

()

{ "avoid_related_anomalies_for": ["delta_top_to_second_score"] }
find_related_anomalies_for

Name

Description

Type

Default

find_related_anomalies_for

A list of fields to check for correlated anomalies to the main anomaly in a generated insight. These correlated anomalies might help with understanding the possible cause of an insight. Leave empty to search in all fields.

None

()

{ "find_related_anomalies_for": ["sentiment_score", "confidence_interval"] }
related_anomalies_min_correlation

Name

Description

Type

Default

related_anomalies_min_correlation

Minimal Pearson correlation between the metric on which an anomaly was found and another metric with an anomaly on the same segment, below which Mona will not use the other metric as a related anomaly.

NonNegativeFloat

0.3

{ "related_anomalies_min_correlation": 0.5 }

Segmentation Params

see more
always_segment_baseline_by

Name

Description

Type

Default

always_segment_baseline_by

A list of dimensions to always segment the baseline segment by. This is useful when separating the world to completely unrelated parts - e.g., in a case where you have a different model developed for each customer and there's no need to look for insights across different customers. Avoiding insights generation on a specific value of a segmentation field can be done by using the "avoid_values" key in the segmentation json object, as seen in the example.

None

()

{ "always_segment_baseline_by": [ "country", {"name": "city", "avoid_values": ["Tel Aviv"]}, ] }
max_segment_baseline_by_depth

Name

Description

Type

Default

max_segment_baseline_by_depth

The maximum number of fields Mona should combine for segmenting the baseline (if "segment_baseline_by" fields given).

PositiveInt

2

{ "max_segment_baseline_by_depth": 3 }
max_segment_by_depth

Name

Description

Type

Default

max_segment_by_depth

The maximum number of fields Mona should combine to create sub-segments to search in. Baseline segment fields and parent fields are "free", and are not counted for depth. Notice this parameter has a exponential effect on the performance and should be kept within SLAs.

PositiveInt

2

{ "max_segment_by_depth": 3 }
min_segment_baseline_by_depth

Name

Description

Type

Default

min_segment_baseline_by_depth

The minimum number of fields Mona should combine for segmenting the baseline (if "segment_baseline_by" fields given).

NonNegativeInt

0

{ "min_segment_baseline_by_depth": 1 }
min_segment_by_depth

Name

Description

Type

Default

min_segment_by_depth

The minimum number of fields Mona should combine to create sub-segments to search in.

NonNegativeInt

0

{ "min_segment_by_depth": 1 }
segment_baseline_by

Name

Description

Type

Default

segment_baseline_by

A list of dimensions to potentially segment the baseline segment by. Avoiding insights generation on a specific value of a segmentation field can be done by using the "avoid_values" key in the segmentation json object.

None

()

{ "segment_baseline_by": [ "model_version" ] }
segment_by

Name

Description

Type

Default

segment_by

The dimensions to use to segment the data in order to search for anomalies. This list must be a sublist of all arc class' dimensions. Avoiding insights generation on a specific value of a segmentation field can be done by using the "avoid_values" key in the segmentation json object, as seen in the example.

None

[]

{ "segment_by": [ "city", "bot_id", {"name": "provider-code", "avoid_values": ["zoom"]}, {"name": "selected-language", "avoid_values": ["eng", "spa"]} ] }

Size Thresholds Params

see more
baseline_min_segment_size

Name

Description

Type

Default

baseline_min_segment_size

Minimal segment size for the baseline segment.

PositiveFloat

1

{ "baseline_min_segment_size": 100 }
benchmark_baseline_min_segment_size

Name

Description

Type

Default

benchmark_baseline_min_segment_size

Minimal segment size for the benchmark baseline segment.

PositiveFloat

1

{ "benchmark_baseline_min_segment_size": 100 }
benchmark_max_segment_size

Name

Description

Type

Default

benchmark_max_segment_size

Maximal benchmark segment size in number of records. Leave empty to not have such a threshold.

PositiveIntOrNone

None

{ "benchmark_max_segment_size": 1000 } benchmark_max_segment_size_fraction (NonInclusiveFractionOrNoneType) Maximal
benchmark_max_segment_size_fraction

Name

Description

Type

Default

benchmark_max_segment_size_fraction

Maximal benchmark segment size in fraction from baseline segment. Leave empty to not have such a threshold.

NonInclusiveFractionOrNone

None

{ "benchmark_max_segment_size_fraction": 0.2 }
benchmark_min_segment_size

Name

Description

Type

Default

benchmark_min_segment_size

Minimal benchmark segment size in number of records.

PositiveInt

100

{ "benchmark_min_segment_size": 50 }
benchmark_min_segment_size_fraction

Name

Description

Type

Default

benchmark_min_segment_size_fraction

Minimal benchmark segment size in fraction from baseline segment.

InclusiveFraction

0

{ "benchmark_min_segment_size_fraction": 0.05 }
max_segment_size

Name

Description

Type

Default

max_segment_size

Maximal segment size which a segment must have (bigger segments won't be considered in the search). Leave empty to not have such a threshold.

PositiveIntOrNone

None

{ "max_segment_size": 10000 }
max_segment_size_fraction

Name

Description

Type

Default

max_segment_size_fraction

Maximal segment size in fraction from baseline segment, which a segment must have. Leave empty to not have such a threshold.

NonInclusiveFractionOrNone

None

{ "max_segment_size_fraction": 0.2 }
min_exist_freq

Name

Description

Type

Default

min_exist_freq

The minimum fraction of timestamps that a segment need to appear in order to be counted in Target-Benchmark signals. See example in time_series_verse.py.

InclusiveFraction

0.25

{ "min_exist_freq": 0.5 }
min_segment_size

Name

Description

Type

Default

min_segment_size

Minimal segment size for the united benchmark+target segments.

PositiveInt

100

{ "min_segment_size": 100 }
min_segment_size_fraction

Name

Description

Type

Default

min_segment_size_fraction

Minimal segment size in fraction from baseline segment, which a segment must have in order to be considered in the search.

InclusiveFraction

0

{ "min_segment_size_fraction": 0.05 }
min_target_benchmark_size_ratio

Name

Description

Type

Default

min_target_benchmark_size_ratio

The minimum required ratio of segment size (after time range normalization) between target and benchmark periods.

InclusiveFraction

0.01

{ "min_target_benchmark_size_ratio": 0.05 }
target_baseline_min_segment_size

Name

Description

Type

Default

target_baseline_min_segment_size

Minimal segment size for the target baseline segment.

PositiveFloat

1

{ "target_baseline_min_segment_size": 0.8 }
target_max_segment_size

Name

Description

Type

Default

target_max_segment_size

Maximal target segment size in number of records. Leave empty to not have such a threshold.

PositiveIntOrNone

None

{ "target_max_segment_size": 10000 } target_max_segment_size_fraction (NonInclusiveFractionOrNoneType) Maximal target
target_max_segment_size_fraction

Name

Description

Type

Default

target_max_segment_size_fraction

Maximal target segment size in fraction from baseline segment. Leave empty to not have such a threshold.

NonInclusiveFractionOrNone

None

{ "target_max_segment_size_fraction": 0.2 }
target_min_segment_size

Name

Description

Type

Default

target_min_segment_size

Minimal target segment size in number of records.

PositiveInt

100

{ "target_min_segment_size": 50 } target_min_segment_size_fraction (InclusiveFractionType) Minimal target segment
target_min_segment_size_fraction

Name

Description

Type

Default

target_min_segment_size_fraction

Minimal target segment size in fraction from baseline segment.

InclusiveFraction

0

{ "target_min_segment_size_fraction": 0.05 }

Visuals and Enrichments Params

see more
field_vectors

Name

Description

Type

Default

field_vectors

This attribute lists metric vectors for the FE to show on an insight card of this verse. A value in this field can either be a string (in which case the string should correspond to a kapi_vector name in the config) or an array (in which case the array should be treated as an ad-hoc kapi vector defined specifically for this verse).

None

[]

{ "field_vectors": [ "field_vector_group_1", "field_vector_group_2", "field_vector_group_3" ] }
investigate_no_drill

Name

Description

Type

Default

investigate_no_drill

Dictates the link to the investigations page to add to the found insights. If True, the link will point to investigations page with a drilldown to the segment that was found. If it's false the link will point to the investigations page without drilldown, but with the found segment selected so it can be compared with a benchmark of a higher level.

bool

False

{ "investigate_no_drill": true }
time_resolution

Name

Description

Type

Default

time_resolution

Time period to serve as time resolution to use when creating the time series for both the target and the benchmark-set, on which we measure the difference. Format detailed in common/util.py's get_time_period_for_string.

TimeResolution

1d

{ "time_resolution": "1w" }