Non-Convergence and Limit Cycles in the Adam optimizer
Sebastian Bock, Martin Georg Wei{\ss}

TL;DR
This paper investigates the convergence properties of the Adam optimizer, revealing that even in simple quadratic cases, limit cycles can occur regardless of hyperparameter choices, challenging previous assumptions about its stability.
Contribution
It extends the convergence analysis of Adam to include bias correction in batch mode, demonstrating the existence of 2-limit-cycles for all hyperparameters in quadratic functions.
Findings
Limit cycles of period 2 exist for Adam with bias correction.
These cycles occur for all hyperparameter choices in quadratic cases.
Stability analysis of these cycles is provided.
Abstract
One of the most popular training algorithms for deep neural networks is the Adaptive Moment Estimation (Adam) introduced by Kingma and Ba. Despite its success in many applications there is no satisfactory convergence analysis: only local convergence can be shown for batch mode under some restrictions on the hyperparameters, counterexamples exist for incremental mode. Recent results show that for simple quadratic objective functions limit cycles of period 2 exist in batch mode, but only for atypical hyperparameters, and only for the algorithm without bias correction. %More general there are several more adaptive gradient methods which try to estimate a fitting learning rate and / or search direction from the training data to improve the learning process compared to pure gradient descent with fixed learningrate. We extend the convergence analysis for Adam in the batch mode with bias…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAdam
