Learning from Ambiguous Data with Hard Labels

Zeke Xie; Zheng He; Nan Lu; Lichen Bai; Bao Li; Shuo Yang; Mingming; Sun; Ping Li

arXiv:2501.01844·cs.LG·January 9, 2025

Learning from Ambiguous Data with Hard Labels

Zeke Xie, Zheng He, Nan Lu, Lichen Bai, Bao Li, Shuo Yang, Mingming, Sun, Ping Li

PDF

Open Access

TL;DR

This paper introduces Quantized Label Learning (QLL), a framework for training classifiers on ambiguous data with hard labels, improving generalization by accounting for label ambiguity and bias.

Contribution

The paper proposes a novel QLL framework and a Class-wise Positive-Unlabeled risk estimator to effectively learn from quantized labels in ambiguous datasets.

Findings

01

QLL significantly improves model generalization.

02

The CPU risk estimator outperforms existing baselines.

03

Empirical results validate the effectiveness of the proposed methods.

Abstract

Real-world data often contains intrinsic ambiguity that the common single-hard-label annotation paradigm ignores. Standard training using ambiguous data with these hard labels may produce overly confident models and thus leading to poor generalization. In this paper, we propose a novel framework called Quantized Label Learning (QLL) to alleviate this issue. First, we formulate QLL as learning from (very) ambiguous data with hard labels: ideally, each ambiguous instance should be associated with a ground-truth soft-label distribution describing its corresponding probabilistic weight in each class, however, this is usually not accessible; in practice, we can only observe a quantized label, i.e., a hard label sampled (quantized) from the corresponding ground-truth soft-label distribution, of each instance, which can be seen as a biased approximation of the ground-truth soft-label. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification