Lower Bounds for Non-Convex Stochastic Optimization

Yossi Arjevani; Yair Carmon; John C. Duchi; Dylan J. Foster; Nathan; Srebro; Blake Woodworth

arXiv:1912.02365·math.OC·March 1, 2022·6 cites

Lower Bounds for Non-Convex Stochastic Optimization

Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan, Srebro, Blake Woodworth

PDF

Open Access

TL;DR

This paper establishes fundamental lower bounds on the number of stochastic gradient queries needed to find approximate stationary points in non-convex optimization, confirming the optimality of existing algorithms.

Contribution

It provides tight lower bounds for stochastic first-order methods in non-convex optimization, demonstrating the optimality of stochastic gradient descent and variance reduction techniques.

Findings

01

Any algorithm requires at least queries to find an -stationary point.

02

The lower bound of is tight and matches the performance of stochastic gradient descent.

03

In a more restrictive model, at least queries are needed, confirming the optimality of variance reduction methods.

Abstract

We lower bound the complexity of finding $ϵ$ -stationary points (with gradient norm at most $ϵ$ ) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $ϵ^{- 4}$ queries to find an $ϵ$ stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of $ϵ^{- 3}$ queries, establishing the optimality of recently proposed variance reduction techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques