Reassessing How to Compare and Improve the Calibration of Machine   Learning Models

Muthu Chidambaram; Rong Ge

arXiv:2406.04068·cs.LG·February 25, 2025

Reassessing How to Compare and Improve the Calibration of Machine Learning Models

Muthu Chidambaram, Rong Ge

PDF

Open Access 1 Repo

TL;DR

This paper critically examines calibration metrics in machine learning, introduces a new visualization method to jointly assess calibration and generalization, and provides theoretical insights into calibration errors and estimator consistency.

Contribution

It reveals limitations of current calibration evaluation practices, proposes an extended reliability diagram for joint visualization, and proves new theoretical results on calibration errors and estimator consistency.

Findings

01

Trivial recalibration methods can appear optimal without proper metrics.

02

The new visualization detects calibration-generalization trade-offs.

03

Theoretical proofs relate calibration errors and establish estimator consistency.

Abstract

A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction. This property has become increasingly important as the impact of machine learning models has continued to spread to various domains. As a result, there are now a dizzying number of recent papers on measuring and improving the calibration of (specifically deep learning) models. In this work, we reassess the reporting of calibration metrics in the recent literature. We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics (i.e. test accuracy) are accompanied by additional generalization metrics such as negative log-likelihood. We then use a calibration-based decomposition of Bregman divergences to develop a new extension to reliability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

2014mchidamb/reassessing-calibration
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques · Explainable Artificial Intelligence (XAI) · Neural Networks and Applications