ODE approximation for the Adam algorithm: General and overparametrized setting

Steffen Dereich; Arnulf Jentzen; Sebastian Kassing

arXiv:2511.04622·math.OC·November 7, 2025

ODE approximation for the Adam algorithm: General and overparametrized setting

Steffen Dereich, Arnulf Jentzen, Sebastian Kassing

PDF

Open Access

TL;DR

This paper develops an ODE-based framework to analyze the Adam optimizer, revealing its convergence behavior and ability to find global minima in overparametrized settings, with implications for understanding its empirical success.

Contribution

It introduces an ODE approximation for Adam, providing new theoretical insights into its convergence properties and behavior in overparametrized models.

Findings

01

Adam is an asymptotic pseudo-trajectory of a specific vector field.

02

Convergence of Adam implies limits are zeros of the Adam vector field.

03

In overparametrized settings, Adam can converge to global minima under certain conditions.

Abstract

The Adam optimizer is currently presumably the most popular optimization method in deep learning. In this article we develop an ODE based method to study the Adam optimizer in a fast-slow scaling regime. For fixed momentum parameters and vanishing step-sizes, we show that the Adam algorithm is an asymptotic pseudo-trajectory of the flow of a particular vector field, which is referred to as the Adam vector field. Leveraging properties of asymptotic pseudo-trajectories, we establish convergence results for the Adam algorithm. In particular, in a very general setting we show that if the Adam algorithm converges, then the limit must be a zero of the Adam vector field, rather than a local minimizer or critical point of the objective function. In contrast, in the overparametrized empirical risk minimization setting, the Adam algorithm is able to locally find the set of minima. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing