Stochastic Resetting Mitigates Latent Gradient Bias of SGD from Label   Noise

Youngkyoung Bae; Yeongwoo Song; Hawoong Jeong

arXiv:2406.00396·cs.LG·March 14, 2025·2 cites

Stochastic Resetting Mitigates Latent Gradient Bias of SGD from Label Noise

Youngkyoung Bae, Yeongwoo Song, Hawoong Jeong

PDF

Open Access

TL;DR

This paper shows that stochastic resetting during SGD training can significantly reduce the negative effects of label noise, leading to better generalization in deep neural networks.

Contribution

It introduces a novel stochastic resetting method for SGD to mitigate latent gradient bias caused by noisy labels, supported by theoretical analysis and empirical validation.

Findings

01

Resetting improves generalization performance in noisy label settings.

02

Theoretical conditions for when resetting is beneficial are identified.

03

Empirical results confirm the effectiveness of the proposed method.

Abstract

Giving up and starting over may seem wasteful in many situations such as searching for a target or training deep neural networks (DNNs). Our study, though, demonstrates that resetting from a checkpoint can significantly improve generalization performance when training DNNs with noisy labels. In the presence of noisy labels, DNNs initially learn the general patterns of the data but then gradually memorize the corrupted data, leading to overfitting. By deconstructing the dynamics of stochastic gradient descent (SGD), we identify the behavior of a latent gradient bias induced by noisy labels, which harms generalization. To mitigate this negative effect, we apply the stochastic resetting method to SGD, inspired by recent developments in the field of statistical physics achieving efficient target searches. We first theoretically identify the conditions where resetting becomes beneficial, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsStochastic Gradient Descent