Accelerating Variance-Reduced Stochastic Gradient Methods
Derek Driggs, Matthias J. Ehrhardt, Carola-Bibiane Sch\"onlieb

TL;DR
This paper introduces a universal acceleration framework for variance-reduced stochastic gradient methods, enabling them to achieve faster convergence without relying on negative momentum, and demonstrates its effectiveness through numerical experiments.
Contribution
The authors develop a universal acceleration framework that allows all popular variance-reduced methods to attain accelerated convergence rates without negative momentum.
Findings
Accelerated methods outperform non-accelerated versions in experiments.
The framework applies to SAGA, SVRG, SARAH, and SARGE.
Constants in convergence rates depend on gradient estimator bias and variance.
Abstract
Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov's acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on "negative momentum", a technique for further variance reduction that is generally specific to the SVRG gradient estimator. In this work, we show that negative momentum is unnecessary for acceleration and develop a universal acceleration framework that allows all popular variance-reduced methods to achieve accelerated convergence rates. The constants appearing in these rates, including their dependence on the number of functions , scale with the mean-squared-error and bias of the gradient estimator. In a series of numerical experiments, we demonstrate that versions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
