Lower Bounds and Proximally Anchored SGD for Non-Convex Minimization Under Unbounded Variance
Arda Fazla, Ege C. Kaya, Antesh Upadhyay, Abolfazl Hashemi

TL;DR
This paper establishes lower bounds for stochastic non-convex optimization under unbounded variance conditions and introduces PASTA, an algorithm that achieves optimal complexity matching these bounds across various non-convex regimes.
Contribution
It provides the first information-theoretic lower bounds under BG-0 variance growth and proposes PASTA, a unified algorithm that attains these bounds in diverse non-convex settings.
Findings
Lower bounds of $oldsymbol{ ext{Ω}(oldsymbol{ extepsilon}^{-6})}$ and $oldsymbol{ ext{Ω}(oldsymbol{ extepsilon}^{-4})}$ oracle queries established.
PASTA algorithm achieves minimax optimal complexities matching the lower bounds.
Results hold for unbounded domains and stochastic gradients in multiple non-convex regimes.
Abstract
Analysis of Stochastic Gradient Descent (SGD) and its variants typically relies on the assumption of uniformly bounded variance, a condition that frequently fails in practical non-convex settings, such as neural network training, as well as in several elementary optimization settings. While several relaxations are explored in the literature, the Blum-Gladyshev (BG-0) condition, which permits the variance to grow quadratically with distance has recently been shown to be the weakest condition. However, the study of the oracle complexity of stochastic first-order non-convex optimization under BG-0 has remained underexplored. In this paper, we address this gap and establish information-theoretic lower bounds, proving that finding an -stationary point requires stochastic BG-0 oracle queries for smooth functions and queries under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
