Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator
Zhishuai Guo, Yi Xu, Wotao Yin, Rong Jin, Tianbao Yang

TL;DR
This paper offers a new non-convex convergence analysis for adaptive optimization algorithms, revealing how increasing momentum can ensure convergence and extending the analysis to complex problems like min-max and bilevel optimization.
Contribution
It introduces a novel non-convex convergence analysis for adaptive algorithms, highlighting the role of momentum and enabling extensions to advanced optimization problems.
Findings
Increasing momentum ensures convergence with bounded adaptive step sizes.
Stage-wise increasing momentum improves practical convergence.
Algorithms for non-convex min-max and bilevel problems without large batches or double loops.
Abstract
Although adaptive optimization algorithms have been successful in many applications, there are still some mysteries in terms of convergence analysis that have not been unraveled. This paper provides a novel non-convex analysis of adaptive optimization to uncover some of these mysteries. Our contributions are three-fold. First, we show that an increasing or large enough momentum parameter for the first-order moment used in practice is sufficient to ensure the convergence of adaptive algorithms whose adaptive scaling factors of the step size are bounded. Second, our analysis gives insights for practical implementations, e.g., increasing the momentum parameter in a stage-wise manner in accordance with stagewise decreasing step size would help improve the convergence. Third, the modular nature of our analysis allows its extension to solving other optimization problems, e.g., compositional,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Advanced Optimization Algorithms Research
MethodsAdaBound · AMSGrad · Adam
