T-Cal: An optimal test for the calibration of predictive models
Donghwan Lee, Xinmeng Huang, Hamed Hassani, Edgar Dobriban

TL;DR
This paper introduces T-Cal, a minimax optimal statistical test for assessing the calibration of probabilistic models, especially effective when class probabilities are smooth functions, with broad practical applicability.
Contribution
The paper develops T-Cal, a new hypothesis testing method for calibration that is minimax optimal and adaptive to unknown smoothness, filling a gap in reliable calibration assessment.
Findings
T-Cal is minimax optimal for calibration testing.
It performs well with deep neural networks and standard calibration methods.
The method is practical and broadly applicable.
Abstract
The prediction accuracy of machine learning methods is steadily increasing, but the calibration of their uncertainty predictions poses a significant challenge. Numerous works focus on obtaining well-calibrated predictive models, but less is known about reliably assessing model calibration. This limits our ability to know when algorithms for improving calibration have a real effect, and when their improvements are merely artifacts due to random noise in finite datasets. In this work, we consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. The null hypothesis is that the predictive model is calibrated, while the alternative hypothesis is that the deviation from calibration is sufficiently large. We find that detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Machine Learning and Algorithms
