How Flawed Is ECE? An Analysis via Logit Smoothing
Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov

TL;DR
This paper critically examines the limitations of the widely used ECE calibration metric, characterizes its discontinuities, and proposes a new continuous alternative called LS-ECE, demonstrating its practical viability through experiments on image models.
Contribution
It provides a theoretical analysis of ECE's discontinuities and introduces LS-ECE, a novel, continuous calibration metric that addresses ECE's limitations.
Findings
ECE's discontinuities are characterized mathematically.
LS-ECE closely tracks ECE in practical scenarios.
Theoretical issues of ECE may be less impactful in real-world applications.
Abstract
Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference
