Calibration through the Lens of Interpretability
Alireza Torabian, Ruth Urner

TL;DR
This paper conducts an axiomatic analysis of calibration in models, examining desirable properties and metrics, and empirically compares calibration methods with an interpretable decision tree.
Contribution
It introduces an axiomatic framework for understanding calibration, analyzing properties and metrics, and empirically evaluates calibration methods against an interpretable decision tree.
Findings
Certain calibration metrics align with desirable properties.
Interpretable decision trees can serve as effective calibration models.
The axiomatic approach clarifies the trade-offs in calibration evaluation.
Abstract
Calibration is a frequently invoked concept when useful label probability estimates are required on top of classification accuracy. A calibrated model is a function whose values correctly reflect underlying label probabilities. Calibration in itself however does not imply classification accuracy, nor human interpretable estimates, nor is it straightforward to verify calibration from finite data. There is a plethora of evaluation metrics (and loss functions) that each assess a specific aspect of a calibration model. In this work, we initiate an axiomatic study of the notion of calibration. We catalogue desirable properties of calibrated models as well as corresponding evaluation metrics and analyze their feasibility and correspondences. We complement this analysis with an empirical evaluation, comparing common calibration methods to employing a simple, interpretable decision tree.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
