On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong

TL;DR
This paper establishes convergence guarantees for a broad class of Adam-type algorithms in non-convex stochastic optimization, providing conditions for their convergence and analyzing their rates.
Contribution
It introduces mild sufficient conditions ensuring convergence of Adam-type algorithms and analyzes their convergence rates in non-convex settings.
Findings
Adam-type algorithms can achieve an $O(rac{ ext{log} T}{ oot T})$ convergence rate.
Violating the conditions may lead to divergence of these algorithms.
The results extend to deterministic incremental adaptive gradient methods.
Abstract
This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular algorithms such as the Adam, AMSGrad and AdaGrad. Despite their popularity in training deep neural networks, the convergence of these algorithms for solving nonconvex problems remains an open question. This paper provides a set of mild sufficient conditions that guarantee the convergence for the Adam-type methods. We prove that under our derived conditions, these methods can achieve the convergence rate of order for nonconvex stochastic optimization. We show the conditions are essential in the sense that violating them may make the algorithm diverge. Moreover, we propose and analyze a class of (deterministic) incremental adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
MethodsAdam · AdaGrad
