Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization
Ruichen Jiang, Devyani Maladkar, Aryan Mokhtari

TL;DR
This paper proves that under new assumptions, AdaGrad can outperform SGD in non-convex stochastic optimization, providing the first theoretical evidence of adaptive methods' advantage in such settings.
Contribution
It introduces refined assumptions and a new analysis framework showing AdaGrad's provable complexity improvement over SGD in non-convex optimization.
Findings
AdaGrad outperforms SGD by a factor of d in certain non-convex problems.
The paper establishes tight upper and lower bounds for AdaGrad and SGD.
First theoretical demonstration of adaptive gradient methods' advantage in non-convex stochastic optimization.
Abstract
Adaptive gradient methods, such as AdaGrad, are among the most successful optimization algorithms for neural network training. While these methods are known to achieve better dimensional dependence than stochastic gradient descent (SGD) for stochastic convex optimization under favorable geometry, the theoretical justification for their success in stochastic non-convex optimization remains elusive. In fact, under standard assumptions of Lipschitz gradients and bounded noise variance, it is known that SGD is worst-case optimal in terms of finding a near-stationary point with respect to the -norm, making further improvements impossible. Motivated by this limitation, we introduce refined assumptions on the smoothness structure of the objective and the gradient noise variance, which better suit the coordinate-wise nature of adaptive gradient methods. Moreover, we adopt the -norm of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Methods for Nonlinear Equations · Advanced Optimization Algorithms Research · Numerical methods in inverse problems
MethodsALIGN · AdaGrad · Stochastic Gradient Descent
