Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple example
Ph. L. Toint

TL;DR
This paper presents a simple example demonstrating that the ADAM optimization algorithm can diverge even with a fixed stepsize and no noise, highlighting potential stability issues.
Contribution
It provides a straightforward unidimensional example showing divergence of ADAM with fixed stepsize, regardless of parameter choices.
Findings
ADAM diverges on a simple Lipschitz continuous function.
Divergence occurs regardless of parameter settings.
The example clarifies stability limitations of ADAM.
Abstract
A very simple unidimensional function with Lipschitz continuous gradient is constructed such that the ADAM algorithm with constant stepsize, started from the origin, diverges when applied to minimize this function in the absence of noise on the gradient. Divergence occurs irrespective of the choice of the method parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Biology Tumor Growth · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsAdam
