Impact of Label Noise on Learning Complex Features

Rahul Vashisht; P. Krishna Kumar; Harsha Vardhan Govind and; Harish G. Ramaswamy

arXiv:2411.04569·cs.LG·November 8, 2024

Impact of Label Noise on Learning Complex Features

Rahul Vashisht, P. Krishna Kumar, Harsha Vardhan Govind and, Harish G. Ramaswamy

PDF

TL;DR

Pre-training neural networks with noisy labels can enhance their ability to learn complex, diverse features and overcome the bias towards simpler decision boundaries, without sacrificing performance.

Contribution

This work demonstrates that noisy label pre-training promotes learning of complex features and diverse representations in neural networks, addressing limitations of traditional regularization methods.

Findings

01

Pre-training with noisy labels encourages learning complex functions.

02

Pre-training leads to models capturing broader feature sets.

03

Performance remains unaffected despite learning more complex features.

Abstract

Neural networks trained with stochastic gradient descent exhibit an inductive bias towards simpler decision boundaries, typically converging to a narrow family of functions, and often fail to capture more complex features. This phenomenon raises concerns about the capacity of deep models to adequately learn and represent real-world datasets. Traditional approaches such as explicit regularization, data augmentation, architectural modifications, etc., have largely proven ineffective in encouraging the models to learn diverse features. In this work, we investigate the impact of pre-training models with noisy labels on the dynamics of SGD across various architectures and datasets. We show that pretraining promotes learning complex functions and diverse features in the presence of noise. Our experiments demonstrate that pre-training with noisy labels encourages gradient descent to find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training · Stochastic Gradient Descent