On the fast convergence of minibatch heavy ball momentum
Raghu Bollapragada, Tyler Chen, Rachel Ward

TL;DR
This paper proves that stochastic heavy ball momentum achieves fast linear convergence on quadratic problems with large minibatches, bridging the gap between theory and practice in optimization algorithms.
Contribution
It demonstrates that stochastic heavy ball momentum retains the convergence rate of deterministic momentum on quadratic problems with sufficiently large minibatches, supported by spectral norm bounds.
Findings
Fast linear convergence for stochastic heavy ball with large minibatches
Spectral norm bounds for products of random matrices
Numerical results confirming theoretical bounds
Abstract
Simple stochastic momentum methods are widely used in machine learning optimization, but their good practical performance is at odds with an absence of theoretical guarantees of acceleration in the literature. In this work, we aim to close the gap between theory and practice by showing that stochastic heavy ball momentum retains the fast linear rate of (deterministic) heavy ball momentum on quadratic optimization problems, at least when minibatching with a sufficiently large batch size. The algorithm we study can be interpreted as an accelerated randomized Kaczmarz algorithm with minibatching and heavy ball momentum. The analysis relies on carefully decomposing the momentum transition matrix, and using new spectral norm concentration bounds for products of independent random matrices. We provide numerical illustrations demonstrating that our bounds are reasonably sharp.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
