TL;DR
This paper evaluates various estimators of Expected Calibration Error (ECE) to improve calibration assessment of probabilistic classifiers, proposing an empirical procedure to identify the most reliable estimators for different scenarios.
Contribution
It introduces a comprehensive empirical evaluation of ECE estimators, including novel ones, and provides guidelines for selecting the best estimator in practice.
Findings
Certain ECE estimators outperform others in accuracy.
The proposed evaluation procedure helps identify reliable calibration metrics.
Insights into calibration assessment improve model reliability in decision-making.
Abstract
Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken. Studies have shown that most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities. Hence being able to calibrate these models, or enforce calibration while learning them, has regained interest in recent literature. In this context, properly assessing calibration is paramount to quantify new contributions tackling calibration. However, there is room for improvement for commonly used metrics and evaluation of calibration could benefit from deeper analyses. Thus this paper focuses on the empirical evaluation of calibration metrics in the context of classification. More specifically it evaluates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
