Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels
Yangdi Lu, Wenbo He

TL;DR
This paper introduces a novel training framework that uses synthetic samples with soft labels to mitigate the effects of noisy labels in large datasets, improving model robustness and representation quality.
Contribution
It proposes a mixing strategy for synthetic sample creation and a soft label correction method to enhance deep learning with noisy data.
Findings
Outperforms state-of-the-art methods on CIFAR-10, CIFAR-100, Clothing1M, and Webvision datasets.
Produces more separated and clearly bounded feature clusters.
Enhances robustness of learned representations against extreme label noise.
Abstract
Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training, resulting in poor generalization performance. During an early learning phase, deep neural networks have been observed to fit the clean samples before memorizing the mislabeled samples. In this paper, we dig deeper into the representation distributions in the early learning phase and find that, regardless of their noisy labels, learned representations of images from the same category still congregate together. Inspired by it, we propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels. Specifically, we propose a mixing strategy to create the synthetic samples by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Filter Design and Implementation
