Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case
Meixuan He, Yuqing Liang, Jinlan Liu, Dongpo Xu

TL;DR
This paper provides a comprehensive theoretical analysis of Adam's convergence in non-convex stochastic optimization, introducing relaxed hyperparameter conditions and establishing non-ergodic convergence results, including last-iterate convergence.
Contribution
It introduces weaker conditions for Adam's convergence, proves last-iterate convergence in non-convex settings, and establishes non-ergodic convergence rates under the PL condition.
Findings
Almost sure ergodic convergence rate close to o(1/√K)
First proof of last-iterate convergence to stationary points
Non-ergodic convergence rate of O(1/K) under PL condition
Abstract
Adam is a commonly used stochastic optimization algorithm in machine learning. However, its convergence is still not fully understood, especially in the non-convex setting. This paper focuses on exploring hyperparameter settings for the convergence of vanilla Adam and tackling the challenges of non-ergodic convergence related to practical application. The primary contributions are summarized as follows: firstly, we introduce precise definitions of ergodic and non-ergodic convergence, which cover nearly all forms of convergence for stochastic optimization algorithms. Meanwhile, we emphasize the superiority of non-ergodic convergence over ergodic convergence. Secondly, we establish a weaker sufficient condition for the ergodic convergence guarantee of Adam, allowing a more relaxed choice of hyperparameters. On this basis, we achieve the almost sure ergodic convergence rate of Adam, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Face and Expression Recognition · Advanced Bandit Algorithms Research
MethodsAdam
