Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and   Relaxed Assumptions

Bohan Wang; Huishuai Zhang; Zhi-Ming Ma; Wei Chen

arXiv:2305.18471·cs.LG·September 29, 2023·5 cites

Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions

Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen

PDF

Open Access

TL;DR

This paper presents a simplified convergence proof for AdaGrad on non-convex functions, achieving tighter iteration bounds and extending analysis to local smoothness conditions, thereby broadening understanding of AdaGrad's effectiveness.

Contribution

The authors introduce a novel auxiliary function to simplify convergence proofs for AdaGrad, improving bounds and extending analysis to local smoothness assumptions.

Findings

01

AdaGrad converges in (rac{1}{\u03b5^2}) iterations in over-parameterized regimes.

02

Tighter convergence rates than previous (rac{1}{^4}) bounds are established.

03

Convergence under (L_0,L_1) smoothness depends on learning rate, which is shown to be necessary.

Abstract

We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions. The proof is essentially based on a novel auxiliary function $ξ$ that helps eliminate the complexity of handling the correlation between the numerator and denominator of AdaGrad's update. Leveraging simple proofs, we are able to obtain tighter results than existing results \citep{faw2022power} and extend the analysis to several new and important cases. Specifically, for the over-parameterized regime, we show that AdaGrad needs only $O (\frac{1}{ε ^{2}})$ iterations to ensure the gradient norm smaller than $ε$ , which matches the rate of SGD and significantly tighter than existing rates $O (\frac{1}{ε ^{4}})$ for AdaGrad. We then discard the bounded smoothness assumption and consider a realistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research

MethodsStochastic Gradient Descent · AdaGrad