Anti-Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances in Flat Directions
Marcel K\"uhn, Bernd Rosenow

TL;DR
This paper investigates the anti-correlated nature of gradient noise in epoch-based SGD, revealing its impact on weight variance in flat directions and its potential role in improving neural network generalization.
Contribution
It provides an exact calculation of noise autocorrelation during epoch-based training and demonstrates how anti-correlations affect weight variance and model performance.
Findings
Noise in epoch-based SGD is inherently anti-correlated over time.
Anti-correlations reduce weight variance in flat directions.
Training with anti-correlated noise improves test performance.
Abstract
Stochastic Gradient Descent (SGD) has become a cornerstone of neural network optimization due to its computational efficiency and generalization capabilities. However, the gradient noise introduced by SGD is often assumed to be uncorrelated over time, despite the common practice of epoch-based training where data is sampled without replacement. In this work, we challenge this assumption and investigate the effects of epoch-based noise correlations on the stationary distribution of discrete-time SGD with momentum. Our main contributions are twofold: First, we calculate the exact autocorrelation of the noise during epoch-based training under the assumption that the noise is independent of small fluctuations in the weight vector, revealing that SGD noise is inherently anti-correlated over time. Second, we explore the influence of these anti-correlations on the variance of weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
