Lower Bounds for Non-Convex Stochastic Optimization
Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan, Srebro, Blake Woodworth

TL;DR
This paper establishes fundamental lower bounds on the number of stochastic gradient queries needed to find approximate stationary points in non-convex optimization, confirming the optimality of existing algorithms.
Contribution
It provides tight lower bounds for stochastic first-order methods in non-convex optimization, demonstrating the optimality of stochastic gradient descent and variance reduction techniques.
Findings
Any algorithm requires at least queries to find an -stationary point.
The lower bound of is tight and matches the performance of stochastic gradient descent.
In a more restrictive model, at least queries are needed, confirming the optimality of variance reduction methods.
Abstract
We lower bound the complexity of finding -stationary points (with gradient norm at most ) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least queries to find an stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of queries, establishing the optimality of recently proposed variance reduction techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques
