What is Your Metric Telling You? Evaluating Classifier Calibration under Context-Specific Definitions of Reliability
John Kirchenbauer, Jacob Oaks, Eric Heim

TL;DR
This paper emphasizes the importance of context-specific calibration metrics for classifiers, demonstrating that traditional ECE measures and calibration techniques may not be adequate across different practical reliability definitions.
Contribution
The authors develop generalized calibration metrics based on ECE for various reliability definitions and empirically evaluate neural network calibration under these metrics.
Findings
Traditional ECE metrics focus only on predicted class, missing other reliability aspects.
Common calibration techniques do not uniformly improve calibration across diverse reliability metrics.
Different definitions of reliability require tailored calibration evaluation methods.
Abstract
Classifier calibration has received recent attention from the machine learning community due both to its practical utility in facilitating decision making, as well as the observation that modern neural network classifiers are poorly calibrated. Much of this focus has been towards the goal of learning classifiers such that their output with largest magnitude (the "predicted class") is calibrated. However, this narrow interpretation of classifier outputs does not adequately capture the variety of practical use cases in which classifiers can aid in decision making. In this work, we argue that more expressive metrics must be developed that accurately measure calibration error for the specific context in which a classifier will be deployed. To this end, we derive a number of different metrics using a generalization of Expected Calibration Error (ECE) that measure calibration error under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
