SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to   Unknown Parameters, Unbounded Gradients and Affine Variance

Amit Attia; Tomer Koren

arXiv:2302.08783·cs.LG·June 13, 2023

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

Amit Attia, Tomer Koren

PDF

Open Access

TL;DR

This paper provides a comprehensive high-probability analysis of AdaGrad stochastic gradient descent, demonstrating its full adaptivity to unknown parameters, unbounded gradients, and affine variance noise in both convex and non-convex settings.

Contribution

It offers the first analysis of AdaGrad that removes prior limitations, supporting a general noise model and providing sharp convergence rates without assuming problem parameter knowledge.

Findings

01

Supports a general affine variance noise model

02

Provides sharp convergence rates in low and high noise regimes

03

Achieves high-probability bounds without strong global assumptions

Abstract

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general ``affine variance'' noise model and provides sharp rates of convergence in both the low-noise and high-noise~regimes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Stochastic processes and financial applications

Methodsfail · AdaGrad