Learning from Ambiguous Data with Hard Labels
Zeke Xie, Zheng He, Nan Lu, Lichen Bai, Bao Li, Shuo Yang, Mingming, Sun, Ping Li

TL;DR
This paper introduces Quantized Label Learning (QLL), a framework for training classifiers on ambiguous data with hard labels, improving generalization by accounting for label ambiguity and bias.
Contribution
The paper proposes a novel QLL framework and a Class-wise Positive-Unlabeled risk estimator to effectively learn from quantized labels in ambiguous datasets.
Findings
QLL significantly improves model generalization.
The CPU risk estimator outperforms existing baselines.
Empirical results validate the effectiveness of the proposed methods.
Abstract
Real-world data often contains intrinsic ambiguity that the common single-hard-label annotation paradigm ignores. Standard training using ambiguous data with these hard labels may produce overly confident models and thus leading to poor generalization. In this paper, we propose a novel framework called Quantized Label Learning (QLL) to alleviate this issue. First, we formulate QLL as learning from (very) ambiguous data with hard labels: ideally, each ambiguous instance should be associated with a ground-truth soft-label distribution describing its corresponding probabilistic weight in each class, however, this is usually not accessible; in practice, we can only observe a quantized label, i.e., a hard label sampled (quantized) from the corresponding ground-truth soft-label distribution, of each instance, which can be seen as a biased approximation of the ground-truth soft-label. Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
