Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations
Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang, Liu

TL;DR
This paper introduces two real-world noisy label datasets, CIFAR-10N and CIFAR-100N, based on human annotations, to better study and benchmark learning algorithms under realistic noise conditions, highlighting the instance-dependent nature of real-world label noise.
Contribution
The work provides controllable, moderate-sized datasets with ground-truth labels for real-world noise, and benchmarks existing methods to reveal challenges posed by human label noise patterns.
Findings
Real-world noisy labels follow instance-dependent patterns.
Existing synthetic noise assumptions do not fully capture real-world noise.
Real-world noise presents unique challenges for learning algorithms.
Abstract
Existing research on learning with noisy labels mainly focuses on synthetic label noise. Synthetic noise, though has clean structures which greatly enabled statistical analyses, often fails to model real-world noise patterns. The recent literature has observed several efforts to offer real-world noisy datasets, yet the existing efforts suffer from two caveats: (1) The lack of ground-truth verification makes it hard to theoretically study the property and treatment of real-world label noise; (2) These efforts are often of large scales, which may result in unfair comparisons of robust methods within reasonable and accessible computation power. To better understand real-world label noise, it is crucial to build controllable and moderate-sized real-world noisy datasets with both ground-truth and noisy labels. This work presents two new benchmark datasets CIFAR-10N, CIFAR-100N, equipping the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Imbalanced Data Classification Techniques
