Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models
Courtney Paquette, Elliot Paquette

TL;DR
This paper provides an exact analysis of stochastic momentum methods on large-scale quadratic models, revealing their limitations and proposing a new algorithm that achieves optimal complexity without parameter tuning.
Contribution
It introduces a deterministic framework for analyzing stochastic momentum methods and proposes sDANA, a new algorithm with optimal average-case complexity and parameter-free convergence.
Findings
Small-batch stochastic heavy-ball offers no performance gain over SGD with proper step sizes.
Momentum can significantly improve performance in non-strongly convex settings.
sDANA achieves asymptotically optimal complexity without parameter tuning.
Abstract
We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of loss values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum. By introducing hyperparameters that depend on the number of samples, we propose a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Random Matrices and Applications
MethodsStochastic Gradient Descent
