Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters   and Non-ergodic Case

Meixuan He; Yuqing Liang; Jinlan Liu; Dongpo Xu

arXiv:2307.11782·math.OC·February 12, 2025·1 cites

Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case

Meixuan He, Yuqing Liang, Jinlan Liu, Dongpo Xu

PDF

Open Access

TL;DR

This paper provides a comprehensive theoretical analysis of Adam's convergence in non-convex stochastic optimization, introducing relaxed hyperparameter conditions and establishing non-ergodic convergence results, including last-iterate convergence.

Contribution

It introduces weaker conditions for Adam's convergence, proves last-iterate convergence in non-convex settings, and establishes non-ergodic convergence rates under the PL condition.

Findings

01

Almost sure ergodic convergence rate close to o(1/√K)

02

First proof of last-iterate convergence to stationary points

03

Non-ergodic convergence rate of O(1/K) under PL condition

Abstract

Adam is a commonly used stochastic optimization algorithm in machine learning. However, its convergence is still not fully understood, especially in the non-convex setting. This paper focuses on exploring hyperparameter settings for the convergence of vanilla Adam and tackling the challenges of non-ergodic convergence related to practical application. The primary contributions are summarized as follows: firstly, we introduce precise definitions of ergodic and non-ergodic convergence, which cover nearly all forms of convergence for stochastic optimization algorithms. Meanwhile, we emphasize the superiority of non-ergodic convergence over ergodic convergence. Secondly, we establish a weaker sufficient condition for the ergodic convergence guarantee of Adam, allowing a more relaxed choice of hyperparameters. On this basis, we achieve the almost sure ergodic convergence rate of Adam, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Face and Expression Recognition · Advanced Bandit Algorithms Research

MethodsAdam