On the Lower Bound of Minimizing Polyak-{\L}ojasiewicz Functions

Pengyun Yue; Cong Fang; Zhouchen Lin

arXiv:2212.13551·math.OC·August 3, 2023·5 cites

On the Lower Bound of Minimizing Polyak-{\L}ojasiewicz Functions

Pengyun Yue, Cong Fang, Zhouchen Lin

PDF

Open Access

TL;DR

This paper establishes a lower bound on the gradient complexity for first-order algorithms minimizing Polyak-Łojasiewicz functions, showing that Gradient Descent is optimal in this setting and highlighting the hardness of acceleration techniques.

Contribution

It proves a fundamental lower bound on the gradient complexity for first-order methods on PL functions, demonstrating the optimality of Gradient Descent and distinguishing it from strongly convex functions.

Findings

01

Gradient Descent is optimal for minimizing smooth PL functions.

02

Any first-order algorithm requires at least Ω(L/μ log(1/ε)) gradient evaluations.

03

Acceleration techniques cannot improve the complexity beyond this lower bound for PL functions.

Abstract

Polyak-{\L}ojasiewicz (PL) [Polyak, 1963] condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least $Ω (\frac{L}{μ} lo g \frac{1}{ε})$ gradient costs to find an $ε$ -approximate optimal solution for a general $L$ -smooth function that has an $μ$ -PL constant. This result demonstrates the optimality of the Gradient Descent algorithm to minimize smooth PL functions in the sense that there exists a ``hard'' PL function such that no first-order algorithm can be faster than Gradient Descent when ignoring a numerical constant. In contrast, it is well-known that the momentum technique, e.g. [Nesterov,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research