Asymptotic Convergence and Stability of Adaptive Gradient Methods in Smooth Non-convex Optimization
Ruinan Jin, Xiaoyu Wang

TL;DR
This paper provides the first rigorous analysis of the asymptotic convergence and stability of AdaGrad-Norm and RMSProp in smooth non-convex optimization, establishing their almost sure and mean-square convergence.
Contribution
It introduces a novel stopping-time partitioning technique to analyze stability and convergence of adaptive gradient methods in non-convex settings.
Findings
AdaGrad-Norm remains stable and converges in non-convex optimization.
RMSProp also achieves stability and convergence with proper hyperparameters.
Objective function values stay bounded in expectation under mild conditions.
Abstract
Adaptive gradient methods, such as AdaGrad, have become fundamental tools in deep learning. Despite their widespread use, the asymptotic convergence of AdaGrad remains poorly understood in non-convex scenarios. In this work, we present the first rigorous asymptotic convergence analysis of AdaGrad-Norm for smooth non-convex optimization. Using a novel stopping-time partitioning technique, we establish a key stability result: the objective function values remain bounded in expectation, and the iterates are bounded almost surely under a mild coercivity assumption. Building on these stability results, we prove that AdaGrad-Norm achieves both almost sure and mean-square convergence. Furthermore, we extend our analysis to RMSProp and show that, with appropriate hyperparameter choices, it also enjoys stability and asymptotic convergence. The techniques developed herein may be of independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
