Strong error analysis for stochastic gradient descent optimization   algorithms

Arnulf Jentzen; Benno Kuckuck; Ariel Neufeld; Philippe von; Wurstemberger

arXiv:1801.09324·math.NA·October 5, 2020

Strong error analysis for stochastic gradient descent optimization algorithms

Arnulf Jentzen, Benno Kuckuck, Ariel Neufeld, Philippe von, Wurstemberger

PDF

TL;DR

This paper provides a rigorous strong error analysis for stochastic gradient descent algorithms, proving convergence in the strong L^p sense with near-optimal order under standard assumptions, using Lyapunov functions.

Contribution

It introduces a novel convergence framework for SGD based on Lyapunov functions, achieving strong L^p convergence rates for arbitrary p and small epsilon.

Findings

01

Proves strong L^p convergence of SGD with order 1/2 - epsilon.

02

Develops a general convergence machinery using Lyapunov functions.

03

Achieves convergence results under relaxed moment conditions.

Abstract

Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small $ε \in (0, \infty)$ and every arbitrarily large $p \in (0, \infty)$ that the considered SGD optimization algorithm converges in the strong $L^{p}$ -sense with order $\frac{1}{2} - ε$ to the global minimum of the objective function of the considered stochastic approximation problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.