Soft-Label Training Preserves Epistemic Uncertainty

Agamdeep Singh; Ashish Tiwari; Hosein Hasanbeig; Priyanshu Gupta

arXiv:2511.14117·cs.LG·November 19, 2025

Soft-Label Training Preserves Epistemic Uncertainty

Agamdeep Singh, Ashish Tiwari, Hosein Hasanbeig, Priyanshu Gupta

PDF

Open Access

TL;DR

This paper advocates for training models on annotation distributions instead of single labels to better capture epistemic uncertainty, especially in subjective tasks, leading to more calibrated confidence without sacrificing accuracy.

Contribution

It introduces soft-label training that preserves annotation distributions, aligning model uncertainty with human perception in ambiguous data.

Findings

01

Soft-label training reduces KL divergence from human annotations by 32%.

02

It achieves 61% stronger correlation between model and annotation entropy.

03

Models trained with soft labels match the accuracy of traditional hard-label methods.

Abstract

Many machine learning tasks involve inherent subjectivity, where annotators naturally provide varied labels. Standard practice collapses these label distributions into single labels, aggregating diverse human judgments into point estimates. We argue that this approach is epistemically misaligned for ambiguous data--the annotation distribution itself should be regarded as the ground truth. Training on collapsed single labels forces models to express false confidence on fundamentally ambiguous cases, creating a misalignment between model certainty and the diversity of human perception. We demonstrate empirically that soft-label training, which treats annotation distributions as ground truth, preserves epistemic uncertainty. Across both vision and NLP tasks, soft-label training achieves 32% lower KL divergence from human annotations and 61% stronger correlation between model and annotation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Multimodal Machine Learning Applications