Differences Between Hard and Noisy-labeled Samples: An Empirical Study
Mahsa Forouzesh, Patrick Thiran

TL;DR
This paper empirically investigates the differences between hard-to-learn and noisy-labeled samples, proposing a metric to filter noisy labels, which improves model performance on synthetic and real-world noisy datasets.
Contribution
It introduces a systematic empirical study distinguishing hard and noisy samples and proposes a metric for filtering noisy labels to enhance learning accuracy.
Findings
Filtering noisy labels improves test accuracy.
The proposed metric effectively separates noisy from hard samples.
Method outperforms existing approaches in semi-supervised learning.
Abstract
Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and another deals with hard samples. However, when both types of data are present, most existing methods treat them equally, which results in a decline in the overall performance of the model. In this paper, we first design various synthetic datasets with custom hardness and noisiness levels for different samples. Our proposed systematic empirical study enables us to better understand the similarities and more importantly the differences between hard-to-learn samples and incorrectly-labeled samples. These controlled experiments pave the way for the development of methods that distinguish between hard and noisy samples. Through our study, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Data Stream Mining Techniques
