Lower error bounds for the stochastic gradient descent optimization   algorithm: Sharp convergence rates for slowly and fast decaying learning   rates

Arnulf Jentzen; Philippe von Wurstemberger

arXiv:1803.08600·math.NA·October 5, 2020·J. Complex.·1 cites

Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

Arnulf Jentzen, Philippe von Wurstemberger

PDF

Open Access

TL;DR

This paper establishes matching lower and upper bounds for the mean square error of stochastic gradient descent with various learning rate decay schemes, providing precise convergence rate quantification.

Contribution

It introduces the first sharp lower bounds for SGD's mean square error, matching known upper bounds for different learning rate decay rates.

Findings

01

Matching lower and upper bounds for SGD error rates.

02

Convergence rates depend on the decay behavior of learning rates.

03

Quantitative analysis of SGD performance for quadratic problems.

Abstract

The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention as been paid to proving lower error bounds for the SGD method. It is the key contribution of this paper to make a step in this direction. More precisely, in this article we establish for every $γ, ν \in (0, \infty)$ essentially matching lower and upper bounds for the mean square error of the SGD process with learning rates $(\frac{γ}{n ^{ν}})_{n \in N}$ associated to a simple quadratic stochastic optimization problem. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the asymptotic behavior of the learning rates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent