An accurate detection is not all you need to combat label noise in web-noisy datasets
Paul Albert, Jack Valmadre, Eric Arazo, Tarun Krishna, Noel E., O'Connor, Kevin McGuinness

TL;DR
This paper investigates the limitations of using hyperplane-based out-of-distribution detection in noisy web datasets and proposes a hybrid method combining linear separation and small-loss techniques to improve classification accuracy.
Contribution
It reveals that hyperplane-based OOD detection misses valuable clean examples and introduces a hybrid approach that enhances noise robustness in web-crawled datasets.
Findings
Linear hyperplane detection accurately identifies OOD samples.
Hybrid method improves classification accuracy on noisy datasets.
Combining linear separation with SOTA small-loss methods yields state-of-the-art results.
Abstract
Training a classifier on web-crawled data demands learning algorithms that are robust to annotation errors and irrelevant examples. This paper builds upon the recent empirical observation that applying unsupervised contrastive learning to noisy, web-crawled datasets yields a feature representation under which the in-distribution (ID) and out-of-distribution (OOD) samples are linearly separable. We show that direct estimation of the separating hyperplane can indeed offer an accurate detection of OOD samples, and yet, surprisingly, this detection does not translate into gains in classification accuracy. Digging deeper into this phenomenon, we discover that the near-perfect detection misses a type of clean examples that are valuable for supervised learning. These examples often represent visually simple images, which are relatively easy to identify as clean examples using standard loss- or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsContrastive Learning
