Training Classifiers that are Universally Robust to All Label Noise Levels
Jingyi Xu, Tony Q. S. Quek, Kai Fong Ernest Chong

TL;DR
This paper introduces a distillation-based framework for training classifiers that are robust across all levels of label noise, leveraging a small trusted subset and iterative augmentation to improve accuracy in noisy datasets.
Contribution
The proposed method is a novel distillation framework that uses a small clean subset and iterative augmentation to achieve universal robustness to label noise.
Findings
Outperforms existing methods at medium to high noise levels.
Achieves 2.94% accuracy improvement on Clothing1M dataset.
Effective on both synthetic and real-world noisy datasets.
Abstract
For classification tasks, deep neural networks are prone to overfitting in the presence of label noise. Although existing methods are able to alleviate this problem at low noise levels, they encounter significant performance reduction at high noise levels, or even at medium noise levels when the label noise is asymmetric. To train classifiers that are universally robust to all noise levels, and that are not sensitive to any variation in the noise model, we propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning. In particular, we shall assume that a small subset of any given noisy dataset is known to have correct labels, which we treat as "positive", while the remaining noisy subset is treated as "unlabeled". Our framework consists of the following two components: (1) We shall generate, via iterative updates, an augmented clean subset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Industrial Vision Systems and Defect Detection · Imbalanced Data Classification Techniques
