Verified Uncertainty Calibration
Ananya Kumar, Percy Liang, Tengyu Ma

TL;DR
This paper introduces a new calibration method combining scaling and binning to improve probability calibration accuracy and proposes a more precise calibration error estimator, validated on image classification datasets.
Contribution
The authors develop the scaling-binning calibrator that combines parametric fitting with binning, reducing sample complexity and improving calibration accuracy over existing methods.
Findings
Scaling-binning achieves 35% lower calibration error than histogram binning.
The new estimator measures calibration error more accurately with fewer samples.
The approach guarantees true calibration, unlike some existing methods.
Abstract
Applications such as weather forecasting and personalized medicine demand models that output calibrated probability estimates---those representative of the true likelihood of a prediction. Most models are not calibrated out of the box but are recalibrated by post-processing model outputs. We find in this work that popular recalibration methods like Platt scaling and temperature scaling are (i) less calibrated than reported, and (ii) current techniques cannot estimate how miscalibrated they are. An alternative method, histogram binning, has measurable calibration error but is sample inefficient---it requires samples, compared to for scaling methods, where is the number of distinct probabilities the model can output. To get the best of both worlds, we introduce the scaling-binning calibrator, which first fits a parametric function to reduce variance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference
