Sample Selection with Uncertainty of Losses for Learning with Noisy Labels
Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu,, Masashi Sugiyama

TL;DR
This paper proposes a novel sample selection method for learning with noisy labels that uses interval estimation of losses to better distinguish between mislabeled data and underrepresented correctly labeled data, improving robustness.
Contribution
It introduces an uncertainty-aware loss estimation approach using confidence intervals, enhancing the ability to identify true data quality in noisy label scenarios.
Findings
Outperforms baseline methods across various noise types
More effectively distinguishes between mislabeled and underrepresented data
Demonstrates robustness to broad range of label noise
Abstract
In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training. However, losses are generated on-the-fly based on the model being trained with noisy labels, and thus large-loss data are likely but not certainly to be incorrect. There are actually two possibilities of a large-loss data point: (a) it is mislabeled, and then its loss decreases slower than other data, since deep neural networks "learn patterns first"; (b) it belongs to an underrepresented group of data and has not been selected yet. In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses, where lower bounds of the confidence intervals of losses derived from distribution-free concentration inequalities, but not losses themselves, are used for sample selection. In this way, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Statistical Methods and Models · Advanced Statistical Process Monitoring
