The Dataset
In general, Mona can be configured to monitor data from all types of AI models and their business environments and can track anything that would be useful in assessing the entire AI system’s behavior. This can include metadata fields (e.g., geographical information), model inputs (features), model outputs (e.g., the credit score), and business outcomes or ground truth labels when those exist (e.g., whether a loan was paid back on time).
The Tutorial Dataset:
Training and Test
In the Github repo, in the "training" directory, you'd find zip files containing the train and test datasets. The data points in this set contain all data fields for credit analysis, which include Metadata fields such as the occupation of the requester, city, state, loan purpose, and the amount of money offered and approved in the application. The data also contains several numerical input features, system metadata such as the model version, the stage (train or test), and of course the model output (credit_score) and the ground truth label used for training/test.
Example training data
{
"timestamp": 1611835417000,
"occupation": "education",
"city": "Fayetteville",
"state": "Arkansas",
"id": "dfa3353a692e7dc4d8d48102c4ff345f_train",
"purpose": "Car insurance",
"credit_score": 0.5731902999954619,
"loan_taken": true,
"offered_amount": 4334,
"feature_0": 2654996.459602782,
"feature_1": 2.5483001715549025,
"feature_2": 727,
"feature_3": 6775,
"feature_4": 0.4792214869935133,
"feature_5": 1.5258190130021834,
"feature_6": 23,
"feature_7": 1.030829156176713,
"feature_8": 1337.4128221798737,
"feature_9": 39.95245825137266,
"stage": "train",
"return_until": 1613045017000,
"label": 0,
"model_version": "v1"
}
Example test data
{
"timestamp": 1611842438000,
"occupation": "education",
"city": "Phoenix",
"state": "Arizona",
"id": "7b026dd53a16519ef421e6c1f4ffa8ca_test",
"purpose": "Home services",
"credit_score": 0.024648770897898548,
"loan_taken": true,
"offered_amount": 17597,
"feature_0": 2816812.026366597,
"feature_1": 0.5520508993135393,
"feature_2": 601,
"feature_3": 12003,
"feature_4": 0.02760948701383037,
"feature_6": 7,
"feature_7": 8.160327989686898,
"feature_8": 5232.67532775442,
"feature_9": 43.65756482626466,
"stage": "test",
"return_until": 1613052038000,
"label": 0,
"model_version": "v1"
}
Inference
Found in the "loans_inference_time_data" directory, contains the same fields as the training data except for the ground truth label.
Example inference data
{
"timestamp": 1611930089000,
"occupation": "technology",
"city": "Newark",
"state": "New Jersey",
"id": "2f719b2936f06da8d7c79ab9c6c39923",
"purpose": "Home construction",
"credit_score": 0.021553255527124074,
"loan_taken": true,
"offered_amount": 8700,
"approved_amount": 4900,
"feature_0": 3391470.0843921946,
"feature_1": 0.843949135322643,
"feature_2": 590,
"feature_3": 13736,
"feature_4": 0.2720732559814525,
"feature_5": 0.9777986328271696,
"feature_6": 10,
"feature_7": 5.429877032113464,
"feature_8": 1223.328832340112,
"feature_9": 131.82714905405732,
"stage": "inference",
"return_until": 1613139689000,
"model_version": "v1"
}
Feedback
Found in the "loans_feedback_status" directory, contains whether the loan was paid back on time or not. These messages have the same ids as the inference data, which allows Mona to merge them, even if they are exported at different times and from different places.
Example feedback data
{
"id": "2f719b2936f06da8d7c79ab9c6c39923",
"loan_paid_back": false,
"timestamp": 1611930089000
}
Note
Mona can accept any JSON message and your dataset might look completely different than the example fake dataset we are using in this tutorial.
Updated almost 3 years ago