Linear Convergence of Gradient and Proximal-Gradient Methods Under the   Polyak-\L{}ojasiewicz Condition

Hamed Karimi; Julie Nutini; Mark Schmidt

arXiv:1608.04636·cs.LG·September 15, 2020·1 cites

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-\L{}ojasiewicz Condition

Hamed Karimi, Julie Nutini, Mark Schmidt

PDF

Open Access

TL;DR

This paper demonstrates that the Polyak-jolashvilia inequality, an older condition, is weaker than recent conditions for linear convergence and applies it to analyze various gradient-based optimization methods in machine learning.

Contribution

It shows the PL inequality's broader applicability for proving linear convergence and extends the analysis to proximal-gradient methods for non-smooth problems.

Findings

01

Proves linear convergence under the PL inequality for multiple algorithms.

02

Provides simple convergence proofs for various machine learning models.

03

Extends analysis to non-smooth optimization with proximal-gradient methods.

Abstract

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the \L{}ojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-\L{}ojasiewicz (PL) inequality is actually weaker than the main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years. We also use the PL inequality to give new analyses of randomized and greedy coordinate descent methods, sign-based gradient descent methods, and stochastic gradient methods in the classic setting (with decreasing or constant step-sizes) as well as the variance-reduced setting. We further propose a generalization that applies to proximal-gradient methods for non-smooth optimization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems