A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD

Ruinan Jin; Xiao Li; Yaoliang Yu; Baoxiang Wang

arXiv:2410.04458·cs.LG·May 21, 2025

A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD

Ruinan Jin, Xiao Li, Yaoliang Yu, Baoxiang Wang

PDF

Open Access

TL;DR

This paper introduces a new framework for analyzing Adam's convergence, demonstrating that it converges under relaxed assumptions similar to those used for SGD, and providing both asymptotic and non-asymptotic guarantees.

Contribution

The paper develops a comprehensive framework that proves Adam's convergence under standard SGD-like assumptions, bridging the theoretical gap with SGD.

Findings

01

Adam achieves asymptotic convergence in both almost sure and L1 senses.

02

Adam attains non-asymptotic sample complexity bounds comparable to SGD.

03

Convergence is established under relaxed assumptions like L-smoothness and ABC inequality.

Abstract

Adaptive Moment Estimation (Adam) is a cornerstone optimization algorithm in deep learning, widely recognized for its flexibility with adaptive learning rates and efficiency in handling large-scale data. However, despite its practical success, the theoretical understanding of Adam's convergence has been constrained by stringent assumptions, such as almost surely bounded stochastic gradients or uniformly bounded gradients, which are more restrictive than those typically required for analyzing stochastic gradient descent (SGD). In this paper, we introduce a novel and comprehensive framework for analyzing the convergence properties of Adam. This framework offers a versatile approach to establishing Adam's convergence. Specifically, we prove that Adam achieves asymptotic (last iterate sense) convergence in both the almost sure sense and the \(L_1\) sense under the relaxed assumptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference

MethodsAdam · Stochastic Gradient Descent · Approximate Bayesian Computation