A Theoretical Analysis of Learning with Noisily Labeled Data

Yi Xu; Qi Qian; Hao Li; Rong Jin

arXiv:2104.04114·cs.LG·April 12, 2021

A Theoretical Analysis of Learning with Noisily Labeled Data

Yi Xu, Qi Qian, Hao Li, Rong Jin

PDF

Open Access

TL;DR

This paper provides a theoretical explanation for the training behaviors of deep learning models on noisily labeled data, focusing on phenomena like early learning of clean data and phase transition in generalization performance.

Contribution

It offers a theoretical analysis of how deep models learn from noisy labels, explaining phenomena such as clean data being learned first and the impact of label noise on training outcomes.

Findings

01

Clean data is learned first during initial training epochs.

02

Training can improve test error if label noise is below a certain threshold.

03

Excessive training with high noise levels increases test error.

Abstract

Noisy labels are very common in deep supervised learning. Although many studies tend to improve the robustness of deep training for noisy labels, rare works focus on theoretically explaining the training behaviors of learning with noisily labeled data, which is a fundamental principle in understanding its generalization. In this draft, we study its two phenomena, clean data first and phase transition, by explaining them from a theoretical viewpoint. Specifically, we first show that in the first epoch training, the examples with clean labels will be learned first. We then show that after the learning from clean data stage, continuously training model can achieve further improvement in testing error when the rate of corrupted class labels is smaller than a certain threshold; otherwise, extensively training could lead to an increasing testing error.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Machine Learning and Algorithms