All Models Are Miscalibrated, But Some Less So: Comparing Calibration with Conditional Mean Operators
Peter Moskvichev, Dino Sejdinovic

TL;DR
This paper introduces the conditional kernel calibration error (CKCE), a new metric for assessing model calibration that is more robust and effective for relative comparisons, especially under distribution shifts.
Contribution
The paper proposes CKCE, a calibration metric based on Hilbert-Schmidt norms of conditional mean operators, improving robustness and comparison accuracy over existing metrics.
Findings
CKCE provides more consistent model rankings by calibration error.
CKCE is more robust against distribution shifts.
Experimental results on synthetic and real data validate CKCE's effectiveness.
Abstract
When working in a high-risk setting, having well calibrated probabilistic predictive models is a crucial requirement. However, estimators for calibration error are not always able to correctly distinguish which model is better calibrated. We propose the \emph{conditional kernel calibration error} (CKCE) which is based on the Hilbert-Schmidt norm of the difference between conditional mean operators. By working directly with the definition of strong calibration as the distance between conditional distributions, which we represent by their embeddings in reproducing kernel Hilbert spaces, the CKCE is less sensitive to the marginal distribution of predictive models. This makes it more effective for relative comparisons than previously proposed calibration metrics. Our experiments, using both synthetic and real data, show that CKCE provides a more consistent ranking of models by their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
