Nesterov's method with decreasing learning rate leads to accelerated   stochastic gradient descent

Maxime Laborde; Adam M. Oberman

arXiv:1908.07861·math.OC·September 2, 2020·6 cites

Nesterov's method with decreasing learning rate leads to accelerated stochastic gradient descent

Maxime Laborde, Adam M. Oberman

PDF

Open Access

TL;DR

This paper derives new stochastic gradient descent algorithms from a coupled ODE system, demonstrating accelerated convergence rates with decreasing learning rates in both convex and strongly convex settings.

Contribution

It introduces novel SGD algorithms based on ODE discretization with decreasing learning rates, achieving optimal convergence rates and improved constants.

Findings

01

Derived SGD algorithms with accelerated convergence.

02

Proved convergence at optimal rates for last iterate.

03

Achieved better rate constants than previous methods.

Abstract

We present a coupled system of ODEs which, when discretized with a constant time step/learning rate, recovers Nesterov's accelerated gradient descent algorithm. The same ODEs, when discretized with a decreasing learning rate, leads to novel stochastic gradient descent (SGD) algorithms, one in the convex and a second in the strongly convex case. In the strongly convex case, we obtain an algorithm superficially similar to momentum SGD, but with additional terms. In the convex case, we obtain an algorithm with a novel order $k^{3/4}$ learning rate. We prove, extending the Lyapunov function approach from the full gradient case to the stochastic case, that the algorithms converge at the optimal rate for the last iterate of SGD, with rate constants which are better than previously available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Neural Networks and Applications