NewSegments

Description

'New Segments' verses configure Mona to find new segments in the data, meaning they started appearing in the data only lately and did not exist earlier. The period in which a new segment is searched for is configurable to support late updates.

{
  "stanzas": {
    "stanza_name": {
      "verses": [
        {
          "type": "NewSegments",
          "name": "loan__NewSegments",
          "backfill_points": 10,
          "min_segment_size": 1,
          "segment_by": [
            "occupation",
            "city"
          ],
          "time_period": "7d"
        }
      ]
    }
  }
}

In this example we see a NewSegments verse which is configured to search for new segments in any specific "company_id" or "country" (or any intersection of country and company), that have not appeared yet on Mona. The verse looks for new segments in the last 10 periods (backfill_points) of 7 days (time_period)

Basic Params

see more
cadence
NameDescriptionTypeDefault
cadenceThe cadence for evaluation of this verse. Only the following cadences are valid: Minutes: 1m, 5m, 10m, 15m, 20m, 30m. Hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h. Days: 1d, 2d, 3d, 4d, 5d, 6d. Weeks: 1w, 2w, 3w, 4w, 5w.Cadence1d
{ "cadence": "6h" }
default_urgency
NameDescriptionTypeDefault
default_urgencyThe urgency class for insights created using this verse. Currently, supports two values: "normal" (default) and "high". If set to "normal", then specific thresholds for "high" urgency can be set using other parameters prefixed with "highurgency". If set to "high", then threshold parameters prefixed with "highurgency" are not considered at all - since all insights of this verse will be considered as having a "high" urgency.Urgencynormal
{ "default_urgency": "high" }
description
NameDescriptionTypeDefault
descriptionVerse description.str
{ "description": "searches for asc drifts in output_score" }
min_segment_size
NameDescriptionTypeDefault
min_segment_sizeMinimal segment size which a segment must have in order to be considered in the search.PositiveInt1
{ "min_segment_size": 100 }
min_segment_size_fraction
NameDescriptionTypeDefault
min_segment_size_fractionMinimal segment size in fraction from baseline segment, which a segment must have in order to be considered in the search.InclusiveFraction0
{ "min_segment_size_fraction": 0.05 }
name
NameDescriptionTypeDefault
name(Required) The name of the verse. Please note, a verse's name must be different from other verses in the same stanza.strNone
{ "name": "confidence_outliers" }
segment_by
NameDescriptionTypeDefault
segment_byThe dimensions to use to segment the data in order to search for anomalies. This list must be a sublist of all arc class' dimensions. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object, as seen in the example.SegmentationsList()
{ "segment_by": [ "city", "bot_id", {"name": "provider-code", "avoid_values": ["zoom"]}, {"name": "selected-language", "avoid_values": ["eng", "spa"]}, {"name": "country", "include_only_values": ["jpn"]} ] }
staleness_period
NameDescriptionTypeDefault
staleness_periodHow long ago to verify the new segments haven't appeared in. An empty String means no-limit, meaning searching from the beginning of the available data. This is useful for notifications of re-surfacing segments. Format detailed in common/util.py's get_time_period_for_string.TimePeriodOrEmpty
{ "staleness_period": "10d" }
time_period
NameDescriptionTypeDefault
time_periodLatest time period to look for new segments in. Expected format is "" where can be any positive integer, and options currently include: "d" (days), or "w" (weeks). e.g, "1d" means 1 day period.TimePeriodOrEmpty1d
{ "time_period": "1w" }

Advanced Misc Params

see more
cookbook
NameDescriptionTypeDefault
cookbookInstructions on how to read an insight generated by this verse. Expected format is MarkDown.Cookbook
{ "cookbook": "Use **this param** to add instructions using [markdown](https://daringfireball.net/projects/markdown/syntax) syntax on how to read insights generated from this `verse`, and what should the insight recipient do with it." }
disabled
NameDescriptionTypeDefault
disabledIf set to True - this verse won't be used when searching for new insights.boolFalse
{ "disabled": true }
expire_after
NameDescriptionTypeDefault
expire_afterInsights detected by this verse will continue to be considered active for at least this amount of time after the last time they were detected.TimePeriodOrEmpty3d
{ "expire_after": "2d" }
num_largest_per_dimensions_set
NameDescriptionTypeDefault
num_largest_per_dimensions_setFor each dimensions set we look for new segments it, this is the number of biggest segments we consider. e.g, 100 means we will only check the largest 100 values in each dimensions set.int100
{ "num_largest_per_dimensions_set": 200 }
relevant_data_time_buffer
NameDescriptionTypeDefault
relevant_data_time_bufferAdds an end-time buffer to the insight generation. For example - If this param's value is "1d", then insights are generated for a day before the latest received data. This is useful for processes in which it takes a specific period of time to get all the healthy monitoring data in place.TimePeriodOrEmpty
{ "relevant_data_time_buffer": "1d" }
timestamp_field_name
NameDescriptionTypeDefault
timestamp_field_nameThe field that is used as the time dimension for insight generation.TimestampFieldtimestamp
{ "timestamp_field_name": "run_end_time" }
timezone
NameDescriptionTypeDefault
timezoneThe timezone used to aggregate daily data points. Accepts any IANA time zone ID: (https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)TimezoneUTC
{ "timezone": "Asia/Hong_Kong" }

Data Filtering Params

see more
avoid_segmenting_on_missing
NameDescriptionTypeDefault
avoid_segmenting_on_missingWhen true, insights will not be generated for segments which are (partially or fully) defined by a missing field.boolFalse
{ "avoid_segmenting_on_missing": true }
baseline_segment
NameDescriptionTypeDefault
baseline_segmentThe baseline segment of this verse. This segment defines "the world" as far as this verse is concerned. Only data from this segment will be considered when finding insights.Segment{}
{ "baseline_segment": { "model_version": [ { "value": "V1" } ] } }
enhance_exclude_segments
NameDescriptionTypeDefault
enhance_exclude_segmentsIf True, when exclude segments are added to any level of configuration (either in the verse, the stanza or the stanzas_global_defaults) they are ADDED to the excluded segments of higher level defaults, if exists any. For example, if we have in stanzas_global_default a single excluded segment of {dimensionA: MISSING}, and the stanza (or verse) has a single excluded segment of {dimensionB: 0}, then if enhance_exclude_segments is True, the excluded segments will include both {dimensionA: MISSING} and {dimensionB: 0} and will filter either one. Otherwise (if enhance_exclude_segments is False) it will be overridden to just the one segment in the verse {dimensionB: 0}.boolFalse
{ "enhance_exclude_segments": true }
exclude_segments
NameDescriptionTypeDefault
exclude_segmentsSegments to exclude in the baseline of this verse. Each data we search for will not include these segments - both tested segments as well as any benchmarks used to find the anomalies. Notice that whether or not this param will override definitions for exclude_segments in other levels is decided by enhance_exclude_segments.SegmentsList()
{ "exclude_segments": [ { "text_length": [ { "min_value": 0, "max_value": 100 } ] } ] }

Required Params

see more
name
NameDescriptionTypeDefault
name(Required) The name of the verse. Please note, a verse's name must be different from other verses in the same stanza.strNone
{ "name": "confidence_outliers" }

Segmentation Params

see more
always_segment_baseline_by
NameDescriptionTypeDefault
always_segment_baseline_byA list of dimensions to always segment the baseline segment by. This is useful when separating the world to completely unrelated parts - e.g., in a case where you have a different model developed for each customer and there's no need to look for insights across different customers. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object, as seen in the example.SegmentationsList()
{ "always_segment_baseline_by": [ "country", {"name": "city", "avoid_values": ["Tel Aviv"]}, ] }
avoid_segmenting_on_missing
NameDescriptionTypeDefault
avoid_segmenting_on_missingWhen true, insights will not be generated for segments which are (partially or fully) defined by a missing field.boolFalse
{ "avoid_segmenting_on_missing": true }
max_segment_by_depth
NameDescriptionTypeDefault
max_segment_by_depthThe maximum number of fields Mona should combine to create sub-segments to search in. Baseline segment fields and parent fields are "free", and are not counted for depth. Notice this parameter has a exponential effect on the performance and should be kept within SLAs.PositiveInt1
{ "max_segment_by_depth": 2 }
min_segment_by_depth
NameDescriptionTypeDefault
min_segment_by_depthThe minimum number of fields Mona should combine to create sub-segments to search in.NonNegativeInt0
{ "min_segment_by_depth": 1 }
segment_by
NameDescriptionTypeDefault
segment_byThe dimensions to use to segment the data in order to search for anomalies. This list must be a sublist of all arc class' dimensions. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object, as seen in the example.SegmentationsList()
{ "segment_by": [ "city", "bot_id", {"name": "provider-code", "avoid_values": ["zoom"]}, {"name": "selected-language", "avoid_values": ["eng", "spa"]}, {"name": "country", "include_only_values": ["jpn"]} ] }

Size Thresholds Params

see more
baseline_min_segment_size
NameDescriptionTypeDefault
baseline_min_segment_sizeMinimal segment size for the baseline segment.PositiveFloat1
{ "baseline_min_segment_size": 100 }
high_urgency_baseline_min_segment_size
NameDescriptionTypeDefault
high_urgency_baseline_min_segment_sizeThreshold for separating between "high" and "normal" urgency insights with regards to baseline_min_segment_size. See "baseline_min_segment_size" param for more details on the functionality of this param.PositiveFloatOrNoneNone
{ "high_urgency_baseline_min_segment_size": 1000 }
high_urgency_min_segment_size
NameDescriptionTypeDefault
high_urgency_min_segment_sizeThreshold for separating between "high" and "normal" urgency insights with regards to min_segment_size. See "min_segment_size" param for more details on the functionality of this param.PositiveIntOrNoneNone
{ "high_urgency_min_segment_size": 1000 }
high_urgency_min_segment_size_fraction
NameDescriptionTypeDefault
high_urgency_min_segment_size_fractionThreshold for separating between "high" and "normal" urgency insights with regards to min_segment_size_fraction. See "min_segment_size_fraction" param for more details on the functionality of this param.InclusiveFractionOrNoneNone
{ "high_urgency_min_segment_size_fraction": 0.2 }
max_segment_size
NameDescriptionTypeDefault
max_segment_sizeMaximal segment size which a segment must have (bigger segments won't be considered in the search). Leave empty to not have such a threshold.PositiveIntOrNoneNone
{ "max_segment_size": 10000 }
max_segment_size_fraction
NameDescriptionTypeDefault
max_segment_size_fractionMaximal segment size in fraction from baseline segment, which a segment must have. Leave empty to not have such a threshold.NonInclusiveFractionOrNoneNone
{ "max_segment_size_fraction": 0.2 }
min_segment_size
NameDescriptionTypeDefault
min_segment_sizeMinimal segment size which a segment must have in order to be considered in the search.PositiveInt1
{ "min_segment_size": 100 }
min_segment_size_fraction
NameDescriptionTypeDefault
min_segment_size_fractionMinimal segment size in fraction from baseline segment, which a segment must have in order to be considered in the search.InclusiveFraction0
{ "min_segment_size_fraction": 0.05 }

Time Related Params

see more
avoid_sub_segments_period
NameDescriptionTypeDefault
avoid_sub_segments_periodTime period in which detecting subsegments of a recently detected new segment will not cause a new insight to be generated.TimePeriod7d
{ "avoid_sub_segments_period": "14d" }
cadence
NameDescriptionTypeDefault
cadenceThe cadence for evaluation of this verse. Only the following cadences are valid: Minutes: 1m, 5m, 10m, 15m, 20m, 30m. Hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h. Days: 1d, 2d, 3d, 4d, 5d, 6d. Weeks: 1w, 2w, 3w, 4w, 5w.Cadence1d
{ "cadence": "6h" }
expire_after
NameDescriptionTypeDefault
expire_afterInsights detected by this verse will continue to be considered active for at least this amount of time after the last time they were detected.TimePeriodOrEmpty3d
{ "expire_after": "2d" }
relevant_data_time_buffer
NameDescriptionTypeDefault
relevant_data_time_bufferAdds an end-time buffer to the insight generation. For example - If this param's value is "1d", then insights are generated for a day before the latest received data. This is useful for processes in which it takes a specific period of time to get all the healthy monitoring data in place.TimePeriodOrEmpty
{ "relevant_data_time_buffer": "1d" }
staleness_period
NameDescriptionTypeDefault
staleness_periodHow long ago to verify the new segments haven't appeared in. An empty String means no-limit, meaning searching from the beginning of the available data. This is useful for notifications of re-surfacing segments. Format detailed in common/util.py's get_time_period_for_string.TimePeriodOrEmpty
{ "staleness_period": "10d" }
time_period
NameDescriptionTypeDefault
time_periodLatest time period to look for new segments in. Expected format is "" where can be any positive integer, and options currently include: "d" (days), or "w" (weeks). e.g, "1d" means 1 day period.TimePeriodOrEmpty1d
{ "time_period": "1w" }
timestamp_field_name
NameDescriptionTypeDefault
timestamp_field_nameThe field that is used as the time dimension for insight generation.TimestampFieldtimestamp
{ "timestamp_field_name": "run_end_time" }
timezone
NameDescriptionTypeDefault
timezoneThe timezone used to aggregate daily data points. Accepts any IANA time zone ID: (https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)TimezoneUTC
{ "timezone": "Asia/Hong_Kong" }

Urgency Params

see more
default_urgency
NameDescriptionTypeDefault
default_urgencyThe urgency class for insights created using this verse. Currently, supports two values: "normal" (default) and "high". If set to "normal", then specific thresholds for "high" urgency can be set using other parameters prefixed with "highurgency". If set to "high", then threshold parameters prefixed with "highurgency" are not considered at all - since all insights of this verse will be considered as having a "high" urgency.Urgencynormal
{ "default_urgency": "high" }
high_urgency_baseline_min_segment_size
NameDescriptionTypeDefault
high_urgency_baseline_min_segment_sizeThreshold for separating between "high" and "normal" urgency insights with regards to baseline_min_segment_size. See "baseline_min_segment_size" param for more details on the functionality of this param.PositiveFloatOrNoneNone
{ "high_urgency_baseline_min_segment_size": 1000 }
high_urgency_min_segment_size
NameDescriptionTypeDefault
high_urgency_min_segment_sizeThreshold for separating between "high" and "normal" urgency insights with regards to min_segment_size. See "min_segment_size" param for more details on the functionality of this param.PositiveIntOrNoneNone
{ "high_urgency_min_segment_size": 1000 }
high_urgency_min_segment_size_fraction
NameDescriptionTypeDefault
high_urgency_min_segment_size_fractionThreshold for separating between "high" and "normal" urgency insights with regards to min_segment_size_fraction. See "min_segment_size_fraction" param for more details on the functionality of this param.InclusiveFractionOrNoneNone
{ "high_urgency_min_segment_size_fraction": 0.2 }
high_urgency_require_all_criteria
NameDescriptionTypeDefault
high_urgency_require_all_criteriaDecide if to use 'AND'/'OR' condition between all high_urgency threshold params.boolTrue
{ "high_urgency_require_all_criteria": false }

Visuals and Enrichments Params

see more
field_vectors
NameDescriptionTypeDefault
field_vectorsThis attribute lists metric vectors for the FE to show on an insight card of this verse. A value in this field can either be a string (in which case the string should correspond to a kapi_vector name in the config) or an array (in which case the array should be treated as an ad-hoc kapi vector defined specifically for this verse).FieldVectorsList()
{ "field_vectors": [ "field_vector_group_1", "field_vector_group_2", "field_vector_group_3" ] }
investigate_no_drill
NameDescriptionTypeDefault
investigate_no_drillDictates the link to the investigations page to add to the found insights. If True, the link will point to investigations page with a drilldown to the segment that was found. If it's false the link will point to the investigations page without drilldown, but with the found segment selected, so it can be compared with a benchmark of a higher level.boolFalse
{ "investigate_no_drill": true }

Wizard Params

see more
cadence
NameDescriptionTypeDefault
cadenceThe cadence for evaluation of this verse. Only the following cadences are valid: Minutes: 1m, 5m, 10m, 15m, 20m, 30m. Hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h. Days: 1d, 2d, 3d, 4d, 5d, 6d. Weeks: 1w, 2w, 3w, 4w, 5w.Cadence1d
{ "cadence": "6h" }
default_urgency
NameDescriptionTypeDefault
default_urgencyThe urgency class for insights created using this verse. Currently, supports two values: "normal" (default) and "high". If set to "normal", then specific thresholds for "high" urgency can be set using other parameters prefixed with "highurgency". If set to "high", then threshold parameters prefixed with "highurgency" are not considered at all - since all insights of this verse will be considered as having a "high" urgency.Urgencynormal
{ "default_urgency": "high" }
min_segment_size
NameDescriptionTypeDefault
min_segment_sizeMinimal segment size which a segment must have in order to be considered in the search.PositiveInt1
{ "min_segment_size": 100 }
min_segment_size_fraction
NameDescriptionTypeDefault
min_segment_size_fractionMinimal segment size in fraction from baseline segment, which a segment must have in order to be considered in the search.InclusiveFraction0
{ "min_segment_size_fraction": 0.05 }
name
NameDescriptionTypeDefault
name(Required) The name of the verse. Please note, a verse's name must be different from other verses in the same stanza.strNone
{ "name": "confidence_outliers" }
segment_by
NameDescriptionTypeDefault
segment_byThe dimensions to use to segment the data in order to search for anomalies. This list must be a sublist of all arc class' dimensions. Limiting the possible values of a specific segmentation field on which insights can be generated can be done using the "avoid_values" and the "include_only_values" keys in the segmentation JSON object, as seen in the example.SegmentationsList()
{ "segment_by": [ "city", "bot_id", {"name": "provider-code", "avoid_values": ["zoom"]}, {"name": "selected-language", "avoid_values": ["eng", "spa"]}, {"name": "country", "include_only_values": ["jpn"]} ] }
time_period
NameDescriptionTypeDefault
time_periodLatest time period to look for new segments in. Expected format is "" where can be any positive integer, and options currently include: "d" (days), or "w" (weeks). e.g, "1d" means 1 day period.TimePeriodOrEmpty1d
{ "time_period": "1w" }