Honest calibration assessment for binary outcome predictions
Timo Dimitriadis, Lutz Duembgen, Alexander Henzi, Marius Puke, Johanna, Ziegel

TL;DR
This paper introduces honest, adaptive confidence bands for assessing the calibration of binary outcome predictions, providing valid, narrower, and locally adaptive tools for evaluating model calibration.
Contribution
It proposes novel confidence bands for calibration curves that are valid under isotonicity, adapt to local smoothness and variance, and improve upon existing methods.
Findings
Finite sample coverage guarantee
Narrower than existing approaches
Provides informative calibration insights in real data
Abstract
Probability predictions from binary regressions or machine learning methods ought to be calibrated: If an event is predicted to occur with probability , it should materialize with approximately that frequency, which means that the so-called calibration curve should equal the identity, for all in the unit interval. We propose honest calibration assessment based on novel confidence bands for the calibration curve, which are valid only subject to the natural assumption of isotonicity. Besides testing the classical goodness-of-fit null hypothesis of perfect calibration, our bands facilitate inverted goodness-of-fit tests whose rejection allows for the sought-after conclusion of a sufficiently well specified model. We show that our bands have a finite sample coverage guarantee, are narrower than existing approaches, and adapt to the local smoothness of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Explainable Artificial Intelligence (XAI) · Statistical Methods and Bayesian Inference
