On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers
Markus K\"angsepp, Kaspar Valk, Meelis Kull

TL;DR
This paper introduces a new perspective on classifier calibration evaluation by viewing it as a fit-on-the-test problem, enabling the use of calibration methods for evaluation, tuning, and benchmarking.
Contribution
It proposes the fit-on-the-test view, allowing calibration evaluation to leverage calibration fitting techniques and introduces new calibration families and benchmarking datasets.
Findings
Using calibration methods improves evaluation accuracy.
Cross-validation helps tune ECE bin numbers.
New calibration families PL and PL3 outperform existing methods.
Abstract
Every uncalibrated classifier has a corresponding true calibration map that calibrates its confidence. Deviations of this idealistic map from the identity map reveal miscalibration. Such calibration errors can be reduced with many post-hoc calibration methods which fit some family of calibration maps on a validation dataset. In contrast, evaluation of calibration with the expected calibration error (ECE) on the test set does not explicitly involve fitting. However, as we demonstrate, ECE can still be viewed as if fitting a family of functions on the test data. This motivates the fit-on-the-test view on evaluation: first, approximate a calibration map on the test data, and second, quantify its distance from the identity. Exploiting this view allows us to unlock missed opportunities: (1) use the plethora of post-hoc calibration methods for evaluating calibration; (2) tune the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Machine Learning and Algorithms
