TCE: A Test-Based Approach to Measuring Calibration Error
Takuo Matsubara, Niek Tax, Richard Mudd, Ido Guy

TL;DR
This paper introduces TCE, a new statistical test-based metric for measuring calibration error in probabilistic classifiers, offering clearer interpretation and robustness against class imbalance.
Contribution
It presents a novel calibration error metric, TCE, with a new binning algorithm and improved visualization, advancing calibration assessment methods.
Findings
TCE provides a consistent and interpretable calibration error measure.
The new binning algorithm optimizes empirical probability estimation.
Experiments demonstrate TCE's effectiveness on real-world and ImageNet datasets.
Abstract
This paper proposes a new metric to measure the calibration error of probabilistic binary classifiers, called test-based calibration error (TCE). TCE incorporates a novel loss function based on a statistical test to examine the extent to which model predictions differ from probabilities estimated from data. It offers (i) a clear interpretation, (ii) a consistent scale that is unaffected by class imbalance, and (iii) an enhanced visual representation with repect to the standard reliability diagram. In addition, we introduce an optimality criterion for the binning procedure of calibration error metrics based on a minimal estimation error of the empirical probabilities. We provide a novel computational algorithm for optimal bins under bin-size constraints. We demonstrate properties of TCE through a range of experiments, including multiple real-world imbalanced datasets and ImageNet 1000.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Imaging for Blood Diseases · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
