Learning from Noisy Labels via Conditional Distributionally Robust Optimization
Hui Guo, Grace Y. Yi, Boyu Wang

TL;DR
This paper introduces a novel robust learning framework using conditional distributionally robust optimization (CDRO) to handle noisy labels in crowdsourced datasets, improving model accuracy under high-noise conditions.
Contribution
It formulates a new CDRO-based approach with a robust pseudo-labeling algorithm and provides analytical solutions for efficient optimization, addressing label noise misspecification.
Findings
Outperforms existing methods on synthetic datasets.
Demonstrates robustness in real-world noisy label scenarios.
Provides theoretical guarantees for the proposed approach.
Abstract
While crowdsourcing has emerged as a practical solution for labeling large datasets, it presents a significant challenge in learning accurate models due to noisy labels from annotators with varying levels of expertise. Existing methods typically estimate the true label posterior, conditioned on the instance and noisy annotations, to infer true labels or adjust loss functions. These estimates, however, often overlook potential misspecification in the true label posterior, which can degrade model performances, especially in high-noise scenarios. To address this issue, we investigate learning from noisy annotations with an estimated true label posterior through the framework of conditional distributionally robust optimization (CDRO). We propose formulating the problem as minimizing the worst-case risk within a distance-based ambiguity set centered around a reference distribution. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring
MethodsSparse Evolutionary Training
