Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex   Stochastic Optimization under Relaxed Smoothness

Michael Crawshaw; Mingrui Liu

arXiv:2505.04599·cs.LG·May 9, 2025

Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness

Michael Crawshaw, Mingrui Liu

PDF

Open Access 1 Video

TL;DR

This paper establishes fundamental lower bounds on the complexity of adaptive gradient algorithms in non-convex stochastic optimization under relaxed smoothness conditions, revealing inherent difficulties compared to standard smooth settings.

Contribution

It provides the first complexity lower bounds for adaptive algorithms in the $(L_0, L_1)$-smooth setting, highlighting quadratic dependence on problem parameters and fundamental challenges.

Findings

01

Decorrelated AdaGrad-Norm requires at least $oldsymbol{ ext{Omega}}( riangle^2 L_1^2 \sigma^2 \epsilon^{-4})$ gradient queries.

02

Adaptive algorithms face at least quadratic dependence on problem parameters in the $(L_0, L_1)$-smooth setting.

03

The $(L_0, L_1)$-smooth setting is inherently more difficult than the standard smooth setting for certain adaptive methods.

Abstract

Recent results in non-convex stochastic optimization demonstrate the convergence of popular adaptive algorithms (e.g., AdaGrad) under the $(L_{0}, L_{1})$ -smoothness condition, but the rate of convergence is a higher-order polynomial in terms of problem parameters like the smoothness constants. The complexity guaranteed by such algorithms to find an $ϵ$ -stationary point may be significantly larger than the optimal complexity of $Θ (Δ L σ^{2} ϵ^{- 4})$ achieved by SGD in the $L$ -smooth setting, where $Δ$ is the initial optimality gap, $σ^{2}$ is the variance of stochastic gradient. However, it is currently not known whether these higher-order dependencies can be tightened. To answer this question, we investigate complexity lower bounds for several adaptive optimization algorithms in the $(L_{0}, L_{1})$ -smooth setting, with a focus on the dependence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsAdaGrad · Focus · Stochastic Gradient Descent