Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple   example

Ph. L. Toint

arXiv:2308.00720·cs.LG·August 3, 2023

Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple example

Ph. L. Toint

PDF

Open Access

TL;DR

This paper presents a simple example demonstrating that the ADAM optimization algorithm can diverge even with a fixed stepsize and no noise, highlighting potential stability issues.

Contribution

It provides a straightforward unidimensional example showing divergence of ADAM with fixed stepsize, regardless of parameter choices.

Findings

01

ADAM diverges on a simple Lipschitz continuous function.

02

Divergence occurs regardless of parameter settings.

03

The example clarifies stability limitations of ADAM.

Abstract

A very simple unidimensional function with Lipschitz continuous gradient is constructed such that the ADAM algorithm with constant stepsize, started from the origin, diverges when applied to minimize this function in the absence of noise on the gradient. Divergence occurs irrespective of the choice of the method parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Biology Tumor Growth · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsAdam