T-Cal: An optimal test for the calibration of predictive models

Donghwan Lee; Xinmeng Huang; Hamed Hassani; Edgar Dobriban

arXiv:2203.01850·stat.ML·December 7, 2023

T-Cal: An optimal test for the calibration of predictive models

Donghwan Lee, Xinmeng Huang, Hamed Hassani, Edgar Dobriban

PDF

Open Access 1 Repo

TL;DR

This paper introduces T-Cal, a minimax optimal statistical test for assessing the calibration of probabilistic models, especially effective when class probabilities are smooth functions, with broad practical applicability.

Contribution

The paper develops T-Cal, a new hypothesis testing method for calibration that is minimax optimal and adaptive to unknown smoothness, filling a gap in reliable calibration assessment.

Findings

01

T-Cal is minimax optimal for calibration testing.

02

It performs well with deep neural networks and standard calibration methods.

03

The method is practical and broadly applicable.

Abstract

The prediction accuracy of machine learning methods is steadily increasing, but the calibration of their uncertainty predictions poses a significant challenge. Numerous works focus on obtaining well-calibrated predictive models, but less is known about reliably assessing model calibration. This limits our ability to know when algorithms for improving calibration have a real effect, and when their improvements are merely artifacts due to random noise in finite datasets. In this work, we consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. The null hypothesis is that the predictive model is calibrated, while the alternative hypothesis is that the deviation from calibration is sufficiently large. We find that detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dh7401/t-cal
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Machine Learning and Algorithms