Stochastic Resetting Mitigates Latent Gradient Bias of SGD from Label Noise
Youngkyoung Bae, Yeongwoo Song, Hawoong Jeong

TL;DR
This paper shows that stochastic resetting during SGD training can significantly reduce the negative effects of label noise, leading to better generalization in deep neural networks.
Contribution
It introduces a novel stochastic resetting method for SGD to mitigate latent gradient bias caused by noisy labels, supported by theoretical analysis and empirical validation.
Findings
Resetting improves generalization performance in noisy label settings.
Theoretical conditions for when resetting is beneficial are identified.
Empirical results confirm the effectiveness of the proposed method.
Abstract
Giving up and starting over may seem wasteful in many situations such as searching for a target or training deep neural networks (DNNs). Our study, though, demonstrates that resetting from a checkpoint can significantly improve generalization performance when training DNNs with noisy labels. In the presence of noisy labels, DNNs initially learn the general patterns of the data but then gradually memorize the corrupted data, leading to overfitting. By deconstructing the dynamics of stochastic gradient descent (SGD), we identify the behavior of a latent gradient bias induced by noisy labels, which harms generalization. To mitigate this negative effect, we apply the stochastic resetting method to SGD, inspired by recent developments in the field of statistical physics achieving efficient target searches. We first theoretically identify the conditions where resetting becomes beneficial, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsStochastic Gradient Descent
