Accelerating Variance-Reduced Stochastic Gradient Methods

Derek Driggs; Matthias J. Ehrhardt; Carola-Bibiane Sch\"onlieb

arXiv:1910.09494·math.OC·October 30, 2020·Math. Program.

Accelerating Variance-Reduced Stochastic Gradient Methods

Derek Driggs, Matthias J. Ehrhardt, Carola-Bibiane Sch\"onlieb

PDF

TL;DR

This paper introduces a universal acceleration framework for variance-reduced stochastic gradient methods, enabling them to achieve faster convergence without relying on negative momentum, and demonstrates its effectiveness through numerical experiments.

Contribution

The authors develop a universal acceleration framework that allows all popular variance-reduced methods to attain accelerated convergence rates without negative momentum.

Findings

01

Accelerated methods outperform non-accelerated versions in experiments.

02

The framework applies to SAGA, SVRG, SARAH, and SARGE.

03

Constants in convergence rates depend on gradient estimator bias and variance.

Abstract

Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov's acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on "negative momentum", a technique for further variance reduction that is generally specific to the SVRG gradient estimator. In this work, we show that negative momentum is unnecessary for acceleration and develop a universal acceleration framework that allows all popular variance-reduced methods to achieve accelerated convergence rates. The constants appearing in these rates, including their dependence on the number of functions $n$ , scale with the mean-squared-error and bias of the gradient estimator. In a series of numerical experiments, we demonstrate that versions of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.