Learning with Noisy Labels over Imbalanced Subpopulations
MingCai Chen, Yu Zhao, Bing He, Zongbo Han, Bingzhe Wu, Jianhua Yao

TL;DR
This paper introduces a novel learning method that effectively handles noisy labels and imbalanced subpopulations by estimating sample cleanliness through correlation and applying distributionally robust optimization, improving robustness and accuracy.
Contribution
It proposes a feature-based correlation method for label correction combined with DRO to enhance learning with noisy, imbalanced data, outperforming existing approaches.
Findings
Consistently improves state-of-the-art robustness against noisy labels.
Effectively handles imbalanced subpopulations in various benchmarks.
Enhances generalization performance in real-world noisy, imbalanced datasets.
Abstract
Learning with Noisy Labels (LNL) has attracted significant attention from the research community. Many recent LNL methods rely on the assumption that clean samples tend to have "small loss". However, this assumption always fails to generalize to some real-world cases with imbalanced subpopulations, i.e., training subpopulations varying in sample size or recognition difficulty. Therefore, recent LNL methods face the risk of misclassifying those "informative" samples (e.g., hard samples or samples in the tail subpopulations) into noisy samples, leading to poor generalization performance. To address the above issue, we propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations. It first leverages sample correlation to estimate samples' clean probabilities for label correction and then utilizes corrected labels for Distributionally Robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater Systems and Optimization · Machine Learning and Data Classification
