Differences Between Hard and Noisy-labeled Samples: An Empirical Study

Mahsa Forouzesh; Patrick Thiran

arXiv:2307.10718·cs.LG·July 21, 2023

Differences Between Hard and Noisy-labeled Samples: An Empirical Study

Mahsa Forouzesh, Patrick Thiran

PDF

Open Access 1 Repo

TL;DR

This paper empirically investigates the differences between hard-to-learn and noisy-labeled samples, proposing a metric to filter noisy labels, which improves model performance on synthetic and real-world noisy datasets.

Contribution

It introduces a systematic empirical study distinguishing hard and noisy samples and proposes a metric for filtering noisy labels to enhance learning accuracy.

Findings

01

Filtering noisy labels improves test accuracy.

02

The proposed metric effectively separates noisy from hard samples.

03

Method outperforms existing approaches in semi-supervised learning.

Abstract

Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and another deals with hard samples. However, when both types of data are present, most existing methods treat them equally, which results in a decline in the overall performance of the model. In this paper, we first design various synthetic datasets with custom hardness and noisiness levels for different samples. Our proposed systematic empirical study enables us to better understand the similarities and more importantly the differences between hard-to-learn samples and incorrectly-labeled samples. These controlled experiments pave the way for the development of methods that distinguish between hard and noisy samples. Through our study, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahf93/hard-vs-noisy
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Data Stream Mining Techniques