Confidence Calibration under Ambiguous Ground Truth
Linwei Tao, Haoyang Luo, Minjing Dong, Chang Xu

TL;DR
This paper addresses the problem of confidence calibration when ground truth labels are ambiguous due to annotator disagreement, proposing ambiguity-aware methods that improve calibration without retraining.
Contribution
It introduces a family of post-hoc calibrators that account for label ambiguity, including Dirichlet-Soft, MCTS S=1, and LS-TS, which outperform standard methods under ambiguous ground truth.
Findings
Dirichlet-Soft reduces true-label ECE by up to 87%.
MCTS S=1 matches full-distribution calibration with only one annotation.
LS-TS improves calibration without requiring annotator data.
Abstract
Confidence calibration assumes a unique ground-truth label per input, yet this assumption fails wherever annotators genuinely disagree. Post-hoc calibrators fitted on majority-voted labels, the standard single-label targets used in practice, can appear well-calibrated under conventional evaluation yet remain substantially miscalibrated against the underlying annotator distribution. We show that this failure is structural: under simplifying assumptions, Temperature Scaling is biased toward temperatures that underestimate annotator uncertainty, with true-label miscalibration increasing monotonically with annotation entropy. To address this, we develop a family of ambiguity-aware post-hoc calibrators that optimise proper scoring rules against the full label distribution and require no model retraining. Our methods span progressively weaker annotation requirements: Dirichlet-Soft leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Mobile Crowdsensing and Crowdsourcing · Explainable Artificial Intelligence (XAI)
