Strong error analysis for stochastic gradient descent optimization algorithms
Arnulf Jentzen, Benno Kuckuck, Ariel Neufeld, Philippe von, Wurstemberger

TL;DR
This paper provides a rigorous strong error analysis for stochastic gradient descent algorithms, proving convergence in the strong L^p sense with near-optimal order under standard assumptions, using Lyapunov functions.
Contribution
It introduces a novel convergence framework for SGD based on Lyapunov functions, achieving strong L^p convergence rates for arbitrary p and small epsilon.
Findings
Proves strong L^p convergence of SGD with order 1/2 - epsilon.
Develops a general convergence machinery using Lyapunov functions.
Achieves convergence results under relaxed moment conditions.
Abstract
Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small and every arbitrarily large that the considered SGD optimization algorithm converges in the strong -sense with order to the global minimum of the objective function of the considered stochastic approximation problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
