Lower Bounds for Finding Stationary Points II: First-Order Methods

Yair Carmon; John C. Duchi; Oliver Hinder; Aaron Sidford

arXiv:1711.00841·math.OC·November 3, 2017·Math. Program.

Lower Bounds for Finding Stationary Points II: First-Order Methods

Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

PDF

TL;DR

This paper establishes fundamental lower bounds on the efficiency of first-order methods for finding approximate stationary points in non-convex optimization, revealing inherent complexity limitations.

Contribution

It provides new theoretical lower bounds on convergence rates for deterministic first-order methods in non-convex optimization, extending understanding of their fundamental limitations.

Findings

01

Deterministic first-order methods cannot surpass an $ ext{epsilon}^{-8/5}$ convergence rate for general smooth functions.

02

For functions with Lipschitz first and second derivatives, the lower bound is $ ext{epsilon}^{-12/7}$.

03

Convex functions with Lipschitz gradient allow faster convergence, achieving $ ext{epsilon}^{-1} ext{log}(1/ ext{epsilon})$.

Abstract

We establish lower bounds on the complexity of finding $ϵ$ -stationary points of smooth, non-convex high-dimensional functions using first-order methods. We prove that deterministic first-order methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in $ϵ$ better than $ϵ^{- 8/5}$ , which is within $ϵ^{- 1/15} lo g \frac{1}{ϵ}$ of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove no deterministic first-order method can achieve convergence rates better than $ϵ^{- 12/7}$ , while $ϵ^{- 2}$ is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves the rate $ϵ^{- 1} lo g \frac{1}{ϵ}$ , showing that finding stationary points is easier given convexity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.