Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness
Michael Crawshaw, Mingrui Liu

TL;DR
This paper establishes fundamental lower bounds on the complexity of adaptive gradient algorithms in non-convex stochastic optimization under relaxed smoothness conditions, revealing inherent difficulties compared to standard smooth settings.
Contribution
It provides the first complexity lower bounds for adaptive algorithms in the $(L_0, L_1)$-smooth setting, highlighting quadratic dependence on problem parameters and fundamental challenges.
Findings
Decorrelated AdaGrad-Norm requires at least $oldsymbol{ ext{Omega}}( riangle^2 L_1^2 \sigma^2 \epsilon^{-4})$ gradient queries.
Adaptive algorithms face at least quadratic dependence on problem parameters in the $(L_0, L_1)$-smooth setting.
The $(L_0, L_1)$-smooth setting is inherently more difficult than the standard smooth setting for certain adaptive methods.
Abstract
Recent results in non-convex stochastic optimization demonstrate the convergence of popular adaptive algorithms (e.g., AdaGrad) under the -smoothness condition, but the rate of convergence is a higher-order polynomial in terms of problem parameters like the smoothness constants. The complexity guaranteed by such algorithms to find an -stationary point may be significantly larger than the optimal complexity of achieved by SGD in the -smooth setting, where is the initial optimality gap, is the variance of stochastic gradient. However, it is currently not known whether these higher-order dependencies can be tightened. To answer this question, we investigate complexity lower bounds for several adaptive optimization algorithms in the -smooth setting, with a focus on the dependence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsAdaGrad · Focus · Stochastic Gradient Descent
