Finding Outlier Segments
The second verse type we will use is "AvergeOutlier". This verse type configures Mona to find segments that have a significantly different average for a given metric when compared to a baseline segment in a given time range.
For example, we want to find whether a specific "city" has a higher average of "credit_score" than the rest of the cities.
In our case, we want to check if the average of any of the following metrics: "credit_score", "offered_amount", "approved_amount", "credit_label_delta" and "offered_approved_delta_normalized", is significantlly different in one of the values of our segmentations: "occupation", "purpose", "stage", "model_version" and "city".
Step 1. Add a new AverageOutlier verse.
- On the configurations page, under the "Stanzas" tab, go to the "general" stanza and click on the edit button on the right.
- Go to the "Verses" tab and click on "Add new verse"
- Again, on the left, you will see a list of all possible verse types. Let's choose AverageOutlier.
The verse window is divided into different categories, for different possible params that can be defined in the verse. The first category is "Basic", and it holds all the basic params needed to configure the verse.
All verse params have a default value which will be defined if no other value is given to override
Step 2. Define "Basic" verse params.
In the following params, click on "override" and add the following:
metrics - "credit_score", "offered_amount", "approved_amount", "credit_label_delta", "offered_approved_delta_normalized".
segment_by - "occupation", "purpose", "city".
min_anomaly_level - 0.3.
min_segment_size_fraction - 0.02.
time_resolution - "1w"
All verses added or edited in the GUI will reflect in the configuration JSON file, which can be downloaded on the configurations page.
Here is how this verse will be defined in our JSON config file:
{
"stanzas": {
"name_of_stanza": {
"verses": [
{
"type": "AverageOutlier",
"metrics": [
"credit_score",
"offered_amount",
"approved_amount",
"credit_label_delta",
"offered_approved_delta_normalized"
],
"segment_by": [
"occupation",
"purpose",
"city"
],
"baseline_segment": {
"stage": [
{
"value": "inference"
}
]
},
"min_anomaly_level": 0.3,
"min_segment_size_fraction": 0.005,
"time_resolution": "1w"
}
]
}
}
}
With this verse configuration, we are overriding the default values of AvergeOutlier (defined under "type"), and we are looking for segments with a significantly different average of the given "metrics" in any specific values of the segmentation fields ("segment_by") (or any intersection of values), from the segment's sibling segments.
As we only want to track the above metrics in their inference runs, we added a "baseline_segment" param to state what is the baseline.
For time periods, we are using the default value which is 4 weeks.
We are using "min_anomaly_level" to define that a segment is an outlier when the difference in averages between the segment and its siblings is at least 0.3 standard deviations.
The "min_segment_size_fraction" param will filter out segments that are smaller than 0.5% of the data.
Lastly, "time_resolution" will configure the resolution of the time series chart in the insight's page for insights created by this configuration.
Low thresholds
Note that we are using low thresholds such as min_anomaly_level and min_segment_size_fraction in order to get insights. After getting insights, users can raise the thresholds to get only relevant and significant insights.
Step 3. Save new verse and stanza.
Once all params have been defined, click on "Add verse", and then "Add stanza".
Once this is defined and saved in the config, Mona will start searching for anomalies that match these parameters. When done, new insights will be generated on the insights page.
You can configure Mona to send notifications on new insights via Email, Slack, PagerDuty, and more. We will go over this in the next chapters.
Step 4. Check new insights.
Once a verse is added or edited, Mona's insight generator starts working immediately, but it might take a few minutes until insights are ready.
AvergeOutlier insights that match these params will look like this:
This insight shows a segment ("purpose: "Medical insurance") that has a significantly different average of "credit_score" than this segment's siblings.
The average "credit_score" in this segment is 0.32, compared to 0.25 in the rest of the data. The anomaly level is 0.34.
When clicking on the insight card, you will open the single insight page which will show you additional information regarding this anomaly.
More information on how to read an insight can be found here
References:
Notifications
Updated over 2 years ago