Linear Convergence Rate in Convex Setup is Possible! Gradient Descent   Method Variants under $(L_0,L_1)$-Smoothness

Aleksandr Lobanov; Alexander Gasnikov; Eduard Gorbunov; Martin; Tak\'a\v{c}

arXiv:2412.17050·math.OC·February 20, 2025

Linear Convergence Rate in Convex Setup is Possible! Gradient Descent Method Variants under $(L_0,L_1)$-Smoothness

Aleksandr Lobanov, Alexander Gasnikov, Eduard Gorbunov, Martin, Tak\'a\v{c}

PDF

Open Access

TL;DR

This paper demonstrates that gradient descent methods under generalized $(L_0,L_1)$-smoothness assumptions can achieve linear convergence in convex optimization when the gradient norm is above a certain threshold, and sublinear convergence otherwise.

Contribution

It provides a refined convergence analysis showing linear rates for $(L_0,L_1)$-GD and its variants under generalized smoothness in convex and strongly convex settings.

Findings

01

$(L_0,L_1)$-GD exhibits linear convergence when gradient norm is large.

02

Variants like Normalized, Clipped, and Coordinate Descent share this behavior.

03

Extension of analysis to strongly convex functions.

Abstract

The gradient descent (GD) method -- is a fundamental and likely the most popular optimization algorithm in machine learning (ML), with a history traced back to a paper in 1847 (Cauchy, 1847). It was studied under various assumptions, including so-called $(L_{0}, L_{1})$ -smoothness, which received noticeable attention in the ML community recently. In this paper, we provide a refined convergence analysis of gradient descent and its variants, assuming generalized smoothness. In particular, we show that $(L_{0}, L_{1})$ -GD has the following behavior in the convex setup: as long as $∥\nabla f (x^{k}) ∥ \geq \frac{L _{0}}{L _{1}}$ the algorithm has linear convergence in function suboptimality, and when $∥\nabla f (x^{k}) ∥ < \frac{L _{0}}{L _{1}}$ is satisfied, $(L_{0}, L_{1})$ -GD has standard sublinear rate. Moreover, we also show that this behavior is common for its variants with different types of oracle: Normalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques · Iterative Methods for Nonlinear Equations