Master your Metrics with Calibration

Wissam Siblini; Jordan Fr\'ery; Liyun He-Guelton; Fr\'ed\'eric Obl\'e,; Yi-Qing Wang

arXiv:1909.02827·cs.LG·April 29, 2020

Master your Metrics with Calibration

Wissam Siblini, Jordan Fr\'ery, Liyun He-Guelton, Fr\'ed\'eric Obl\'e,, Yi-Qing Wang

PDF

1 Repo

TL;DR

This paper introduces a calibration method for precision-based metrics like F1-score and AUC-PR, making them invariant to class prior and improving interpretability across subpopulations and periods.

Contribution

It proposes a novel calibration approach for metrics to enhance their interpretability and applicability in real-world model evaluation scenarios.

Findings

01

Calibrated metrics are less dependent on class prior.

02

Improved interpretability of model performance over subpopulations.

03

Enhanced control over what is measured in model evaluation.

Abstract

Machine learning models deployed in real-world applications are often evaluated with precision-based metrics such as F1-score or AUC-PR (Area Under the Curve of Precision Recall). Heavily dependent on the class prior, such metrics make it difficult to interpret the variation of a model's performance over different subpopulations/subperiods in a dataset. In this paper, we propose a way to calibrate the metrics so that they can be made invariant to the prior. We conduct a large number of experiments on balanced and imbalanced data to assess the behavior of calibrated metrics and show that they improve interpretability and provide a better control over what is really measured. We describe specific real-world use-cases where calibration is beneficial such as, for instance, model monitoring in production, reporting, or fairness evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wissam-sib/calibrated_metrics
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability