Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis
Carles Domingo-Enrich

TL;DR
This paper analyzes the variance of shuffling stochastic gradient algorithms like SGD-RR and SGD-SO using power spectral density, providing simple approximations and extending results to momentum and Nesterov methods.
Contribution
It introduces a spectral density-based approach to approximate the stationary variance of shuffled SGD variants, including momentum and Nesterov acceleration.
Findings
Variance decreases in the order SGD, SGD-RR, SGD-SO.
Power spectral density effectively approximates variance.
Results validated through experiments on quadratic functions.
Abstract
When solving finite-sum minimization problems, two common alternatives to stochastic gradient descent (SGD) with theoretical benefits are random reshuffling (SGD-RR) and shuffle-once (SGD-SO), in which functions are sampled in cycles without replacement. Under a convenient stochastic noise approximation which holds experimentally, we study the stationary variances of the iterates of SGD, SGD-RR and SGD-SO, whose leading terms decrease in this order, and obtain simple approximations. To obtain our results, we study the power spectral density of the stochastic gradient noise sequences. Our analysis extends beyond SGD to SGD with momentum and to the stochastic Nesterov's accelerated gradient method. We perform experiments on quadratic objective functions to test the validity of our approximation and the correctness of our findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Statistical Methods and Inference
MethodsStochastic Gradient Descent · SGD with Momentum
