Reassessing How to Compare and Improve the Calibration of Machine Learning Models
Muthu Chidambaram, Rong Ge

TL;DR
This paper critically examines calibration metrics in machine learning, introduces a new visualization method to jointly assess calibration and generalization, and provides theoretical insights into calibration errors and estimator consistency.
Contribution
It reveals limitations of current calibration evaluation practices, proposes an extended reliability diagram for joint visualization, and proves new theoretical results on calibration errors and estimator consistency.
Findings
Trivial recalibration methods can appear optimal without proper metrics.
The new visualization detects calibration-generalization trade-offs.
Theoretical proofs relate calibration errors and establish estimator consistency.
Abstract
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction. This property has become increasingly important as the impact of machine learning models has continued to spread to various domains. As a result, there are now a dizzying number of recent papers on measuring and improving the calibration of (specifically deep learning) models. In this work, we reassess the reporting of calibration metrics in the recent literature. We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics (i.e. test accuracy) are accompanied by additional generalization metrics such as negative log-likelihood. We then use a calibration-based decomposition of Bregman divergences to develop a new extension to reliability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Processing Techniques · Explainable Artificial Intelligence (XAI) · Neural Networks and Applications
