Adam symmetry theorem: characterization of the convergence of the stochastic Adam optimizer
Steffen Dereich, Thang Do, Arnulf Jentzen, Philippe von Wurstemberger

TL;DR
This paper rigorously analyzes the convergence of the Adam optimizer for strongly convex problems, establishing rates and conditions under which Adam converges, and introduces the Adam symmetry theorem highlighting the importance of data distribution symmetry.
Contribution
It provides the first rigorous convergence rates for Adam on strongly convex problems and introduces the Adam symmetry theorem showing convergence depends on data symmetry.
Findings
Convergence rate of 1/2 w.r.t. learning rate
Convergence rate of 1 w.r.t. mini-batch size
Convergence rate of 1 w.r.t. second moment parameter distance
Abstract
Beside the standard stochastic gradient descent (SGD) method, the Adam optimizer due to Kingma & Ba (2014) is currently probably the best-known optimization method for the training of deep neural networks in artificial intelligence (AI) systems. Despite the popularity and the success of Adam it remains an \emph{open research problem} to provide a rigorous convergence analysis for Adam even for the class of strongly convex SOPs. In one of the main results of this work we establish convergence rates for Adam in terms of the number of gradient steps (convergence rate \nicefrac{1}{2} w.r.t. the size of the learning rate), the size of the mini-batches (convergence rate 1 w.r.t. the size of the mini-batches), and the size of the second moment parameter of Adam (convergence rate 1 w.r.t. the distance of the second moment parameter to 1) for the class of strongly convex SOPs. In a further main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Neural Networks and Applications
