On the fast convergence of minibatch heavy ball momentum

Raghu Bollapragada; Tyler Chen; Rachel Ward

arXiv:2206.07553·cs.LG·June 24, 2025·1 cites

On the fast convergence of minibatch heavy ball momentum

Raghu Bollapragada, Tyler Chen, Rachel Ward

PDF

Open Access

TL;DR

This paper proves that stochastic heavy ball momentum achieves fast linear convergence on quadratic problems with large minibatches, bridging the gap between theory and practice in optimization algorithms.

Contribution

It demonstrates that stochastic heavy ball momentum retains the convergence rate of deterministic momentum on quadratic problems with sufficiently large minibatches, supported by spectral norm bounds.

Findings

01

Fast linear convergence for stochastic heavy ball with large minibatches

02

Spectral norm bounds for products of random matrices

03

Numerical results confirming theoretical bounds

Abstract

Simple stochastic momentum methods are widely used in machine learning optimization, but their good practical performance is at odds with an absence of theoretical guarantees of acceleration in the literature. In this work, we aim to close the gap between theory and practice by showing that stochastic heavy ball momentum retains the fast linear rate of (deterministic) heavy ball momentum on quadratic optimization problems, at least when minibatching with a sufficiently large batch size. The algorithm we study can be interpreted as an accelerated randomized Kaczmarz algorithm with minibatching and heavy ball momentum. The analysis relies on carefully decomposing the momentum transition matrix, and using new spectral norm concentration bounds for products of independent random matrices. We provide numerical illustrations demonstrating that our bounds are reasonably sharp.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference