Convergence rates for the Adam optimizer
Steffen Dereich, Arnulf Jentzen

TL;DR
This paper establishes the first optimal convergence rates for the Adam optimizer in strongly convex stochastic optimization problems, providing theoretical insights into its behavior and convergence properties.
Contribution
It introduces a novel convergence analysis for Adam, revealing that it converges to zeros of a new vector field rather than the gradient, with optimal rates.
Findings
Adam converges to zeros of the Adam vector field, not the gradient.
The analysis provides optimal convergence rates for Adam in quadratic stochastic problems.
Adam's behavior differs from traditional gradient descent, especially in convergence targets.
Abstract
Stochastic gradient descent (SGD) optimization methods are nowadays the method of choice for the training of deep neural networks (DNNs) in artificial intelligence systems. In practically relevant training problems, usually not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SGD optimization methods are applied. As of today, maybe the most popular variant of such accelerated and adaptive SGD optimization methods is the famous Adam optimizer proposed by Kingma & Ba in 2014. Despite the popularity of the Adam optimizer in implementations, it remained an open problem of research to provide a convergence analysis for the Adam optimizer even in the situation of simple quadratic stochastic optimization problems where the objective function (the function one intends to minimize) is strongly convex. In this work we solve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetaheuristic Optimization Algorithms Research · Advanced Differential Equations and Dynamical Systems · Polynomial and algebraic computation
MethodsStochastic Gradient Descent · Adam
