A Methodology Establishing Linear Convergence of Adaptive Gradient   Methods under PL Inequality

Kushal Chakrabarti; Mayank Baranwal

arXiv:2407.12629·cs.LG·July 18, 2024

A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality

Kushal Chakrabarti, Mayank Baranwal

PDF

Open Access

TL;DR

This paper proves that popular adaptive gradient methods like AdaGrad and Adam achieve linear convergence when optimizing functions that satisfy the Polyak-Łojasiewicz inequality, providing new theoretical guarantees.

Contribution

It establishes the first linear convergence proofs for AdaGrad and Adam under the PL inequality, unifying analysis for both batch and stochastic gradients.

Findings

01

AdaGrad and Adam converge linearly under PL inequality.

02

The framework applies to both batch and stochastic gradients.

03

Potential for analyzing other Adam variants.

Abstract

Adaptive gradient-descent optimizers are the standard choice for training neural network models. Despite their faster convergence than gradient-descent and remarkable performance in practice, the adaptive optimizers are not as well understood as vanilla gradient-descent. A reason is that the dynamic update of the learning rate that helps in faster convergence of these methods also makes their analysis intricate. Particularly, the simple gradient-descent method converges at a linear rate for a class of optimization problems, whereas the practically faster adaptive gradient methods lack such a theoretical guarantee. The Polyak-{\L}ojasiewicz (PL) inequality is the weakest known class, for which linear convergence of gradient-descent and its momentum variants has been proved. Therefore, in this paper, we prove that AdaGrad and Adam, two well-known adaptive gradient methods, converge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Numerical Analysis Techniques

MethodsAdam · AdaGrad