Confidence Calibration under Ambiguous Ground Truth

Linwei Tao; Haoyang Luo; Minjing Dong; Chang Xu

arXiv:2603.22879·cs.LG·March 25, 2026

Confidence Calibration under Ambiguous Ground Truth

Linwei Tao, Haoyang Luo, Minjing Dong, Chang Xu

PDF

Open Access

TL;DR

This paper addresses the problem of confidence calibration when ground truth labels are ambiguous due to annotator disagreement, proposing ambiguity-aware methods that improve calibration without retraining.

Contribution

It introduces a family of post-hoc calibrators that account for label ambiguity, including Dirichlet-Soft, MCTS S=1, and LS-TS, which outperform standard methods under ambiguous ground truth.

Findings

01

Dirichlet-Soft reduces true-label ECE by up to 87%.

02

MCTS S=1 matches full-distribution calibration with only one annotation.

03

LS-TS improves calibration without requiring annotator data.

Abstract

Confidence calibration assumes a unique ground-truth label per input, yet this assumption fails wherever annotators genuinely disagree. Post-hoc calibrators fitted on majority-voted labels, the standard single-label targets used in practice, can appear well-calibrated under conventional evaluation yet remain substantially miscalibrated against the underlying annotator distribution. We show that this failure is structural: under simplifying assumptions, Temperature Scaling is biased toward temperatures that underestimate annotator uncertainty, with true-label miscalibration increasing monotonically with annotation entropy. To address this, we develop a family of ambiguity-aware post-hoc calibrators that optimise proper scoring rules against the full label distribution and require no model retraining. Our methods span progressively weaker annotation requirements: Dirichlet-Soft leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Mobile Crowdsensing and Crowdsourcing · Explainable Artificial Intelligence (XAI)