Mitigating Bias in Calibration Error Estimation

Rebecca Roelofs; Nicholas Cain; Jonathon Shlens; Michael C. Mozer

arXiv:2012.08668·cs.LG·February 14, 2022·23 cites

Mitigating Bias in Calibration Error Estimation

Rebecca Roelofs, Nicholas Cain, Jonathon Shlens, Michael C. Mozer

PDF

Open Access 1 Repo

TL;DR

This paper investigates statistical bias in calibration error estimation for AI systems, proposing improved estimators that enhance calibration assessment and model reliability.

Contribution

It introduces a framework for bias assessment, identifies better estimators like ECE_sweep, and demonstrates their effectiveness in calibration evaluation.

Findings

01

Equal-mass binning reduces bias compared to equal-width binning.

02

The proposed ECE_sweep estimator improves calibration detection.

03

Debiased estimator and ECE_sweep outperform traditional methods.

Abstract

For an AI system to be reliable, the confidence it expresses in its decisions must match its accuracy. To assess the degree of match, examples are typically binned by confidence and the per-bin mean confidence and accuracy are compared. Most research in calibration focuses on techniques to reduce this empirical measure of calibration error, ECE_bin. We instead focus on assessing statistical bias in this empirical measure, and we identify better estimators. We propose a framework through which we can compute the bias of a particular estimator for an evaluation data set of a given size. The framework involves synthesizing model outputs that have the same statistics as common neural architectures on popular data sets. We find that binning-based estimators with bins of equal mass (number of instances) have lower bias than estimators with bins of equal width. Our results indicate two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/google-research
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning