Drift Detection
The first verse type we will use is "AverageDrift". This verse type configures Mona to find segments, in which a metric's average differs significantly between a target dataset and a benchmark dataset.
In our case, we want to track the average of the following metrics:
"credit_score", "offered_amount", "approved_amount", "credit_label_delta" and "credit_label_abs_delta".
We want to track these metrics in different segments of our data, using several segmentations, such as: "occupation", "purpose", "stage", "model_version" and "city".
Step 1. Create your first Stanza
Let's define this verse via Mona's config GUI.
Each verse you define in Mona must be inside a stanza. Each stanza groups verses with similar parameters and can hold as many verses as needed. So let's first create our stanza.
- On the configurations page, under the "Stanzas" tab, click on "Add stanza".
- Now you can name the stanza. We will call it "general".
Step 2. Add your first Verse.
Now let's create our verse.
- Under the "Verses" tab, click on the add button.
- On the left, you will see a list of all possible verse types. Let's choose AverageDrift.
The verse window is divided into different categories, for different possible params that can be defined in the verse. The first category is "Basic", and it holds all the basic parameters needed to configure the verse.
All verse params have a default value which will be defined if no other value is given to override
Step 3. Define "Basic" verse params.
In the following params, click on "override" and add the following:
metrics - "credit_score", "offered_amount", "approved_amount", "credit_label_delta", "credit_label_abs_delta".
segment_by - "occupation", "purpose", "city".
min_anomaly_level - 0.2.
min_segment_size_fraction - 0.02.
All verses added or edited in the GUI will reflect in the configuration JSON file, which can be downloaded on the configurations page.
Here is how this verse will be defined in our JSON config file:
{
"stanzas": {
"general": {
"verses": [
{
"type": "AverageDrift",
"metrics": [
"credit_score",
"offered_amount",
"approved_amount",
"credit_label_delta",
"offered_approved_delta_normalized"
],
"segment_by": [
"occupation",
"purpose",
"city"
],
"baseline_segment": {
"stage": [
{
"value": "inference"
}
]
},
"target_set_period": "2w",
"benchmark_set_period": "6w",
"min_anomaly_level": 0.2,
"min_segment_size_fraction": 0.005
}
]
}
}
}
With this verse configuration, we are overriding the default values of AverageDrift (defined under "type"), and we are looking for statistically significant changes in the average of the given "metrics" in any specific values of the segmentation fields ("segment_by") (or any intersection of values).
As we only want to track the above metrics in their inference runs, we added a "baseline_segment" param to state what is the baseline.
For time periods, we are using the default values which are 2 weeks in the "target" dataset, and 6 weeks as the "benchmark" dataset.
We are using "min_anomaly_level" to define that drifts occur when the change in averages between the benchmark and target sets is at least 0.2 standard deviations.
The "min_segment_size_fraction" param will filter out segments that are smaller than 0.5% of the data.
Low thresholds
Note that we are using low thresholds such as min_anomaly_level and min_segment_size_fraction in order to get insights. After getting insights, users can raise the thresholds to get only relevant and significant insights.
Step 4. Save new verse and stanza.
Once all params have been defined, click on "Add verse", and then "Add stanza".
Once this is defined and saved in the config, Mona will start searching for anomalies that match these parameters. When done, new insights will be generated on the insights page.
You can configure Mona to send notifications on new insights via Email, Slack, PagerDuty, and more. We will go over this in the next chapters.
Step 5. Check new insights.
Once a verse is added or edited, Mona's insight generator starts working immediately, but it might take a few minutes until insights are ready.
AvergeDrift insights that match these params will look like this:
This insight shows a drift in the average of "credit_score" when looking at the segment "city": "Overland Park_Kansas". The drift is an incline from 0.21 to 0.29 and has an anomaly level of 0.38.
When clicking on the insight card, you will open the single insight page which will show you additional information regarding this anomaly.
Besides the data shown also in the insight card, you can also see here the distribution of values for "credit_score".
More information on how to read an insight can be found here
References:
Notifications
Updated 4 months ago