Calibration through the Lens of Interpretability

Alireza Torabian; Ruth Urner

arXiv:2412.00943·cs.LG·December 3, 2024

Calibration through the Lens of Interpretability

Alireza Torabian, Ruth Urner

PDF

TL;DR

This paper conducts an axiomatic analysis of calibration in models, examining desirable properties and metrics, and empirically compares calibration methods with an interpretable decision tree.

Contribution

It introduces an axiomatic framework for understanding calibration, analyzing properties and metrics, and empirically evaluates calibration methods against an interpretable decision tree.

Findings

01

Certain calibration metrics align with desirable properties.

02

Interpretable decision trees can serve as effective calibration models.

03

The axiomatic approach clarifies the trade-offs in calibration evaluation.

Abstract

Calibration is a frequently invoked concept when useful label probability estimates are required on top of classification accuracy. A calibrated model is a function whose values correctly reflect underlying label probabilities. Calibration in itself however does not imply classification accuracy, nor human interpretable estimates, nor is it straightforward to verify calibration from finite data. There is a plethora of evaluation metrics (and loss functions) that each assess a specific aspect of a calibration model. In this work, we initiate an axiomatic study of the notion of calibration. We catalogue desirable properties of calibrated models as well as corresponding evaluation metrics and analyze their feasibility and correspondences. We complement this analysis with an empirical evaluation, comparing common calibration methods to employing a simple, interpretable decision tree.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.