Beyond ECE: Calibrated Size Ratio, Risk Assessment, and Confidence-Weighted Metrics
Fernando Martin-Maroto, Nabil Abderrahaman, Gonzalo G. de Polavieja

TL;DR
This paper introduces the Calibrated Size Ratio (CSR) and confidence-weighted metrics as improved tools for assessing model calibration and discriminative confidence, addressing limitations of traditional ECE.
Contribution
It proposes the CSR metric and confidence-weighted accuracy and AUC, providing more interpretable and comprehensive calibration and risk assessment methods.
Findings
CSR effectively distinguishes risky from non-risky confidence assignments.
Confidence-weighted metrics capture calibration information that classical metrics miss.
Standard methods can produce risky confidence profiles in real datasets.
Abstract
Confidence calibration has been dominated by the Expected Calibration Error (ECE), a linear metric that counts calibration offset equally regardless of the confidence level at which it occurs. We show that ECE can remain small even under arbitrarily large overconfidence risk, so we propose Calibrated Size Ratio (CSR) instead, an interpretable metric that equals 1 under perfect calibration, from which we derive the risk probability that quantifies the statistical evidence for overconfidence. We further argue that overconfidence risk assessment must be complemented by a measure of discriminative value: whether the assigned confidences actively distinguish correct from incorrect predictions. We show that confidence-weighted accuracy is the natural such complement, and that confidence-weighting extends to all standard classification metrics. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
