A Novel Convergence Analysis for Algorithms of the Adam Family
Zhishuai Guo, Yi Xu, Wotao Yin, Rong Jin, Tianbao Yang

TL;DR
This paper provides a simple, generic convergence analysis for Adam-style algorithms, covering many variants and broad non-convex problems, under mild assumptions on stochastic gradients and momentum parameters.
Contribution
It introduces a unified convergence proof for Adam and its variants, applicable to complex non-convex optimization problems with minimal assumptions.
Findings
Convergence holds for Adam, AMSGrad, Adabound under mild conditions.
Analysis requires only a large momentum parameter and bounded adaptive step size.
Applicable to non-convex problems like min-max, compositional, bilevel optimization.
Abstract
Since its invention in 2014, the Adam optimizer has received tremendous attention. On one hand, it has been widely used in deep learning and many variants have been proposed, while on the other hand their theoretical convergence property remains to be a mystery. It is far from satisfactory in the sense that some studies require strong assumptions about the updates, which are not necessarily applicable in practice, while other studies still follow the original problematic convergence analysis of Adam, which was shown to be not sufficient to ensure convergence. Although rigorous convergence analysis exists for Adam, they impose specific requirements on the update of the adaptive step size, which are not generic enough to cover many other variants of Adam. To address theses issues, in this extended abstract, we present a simple and generic proof of convergence for a family of Adam-style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsAMSGrad · Adam
