Non-ergodic linear convergence property of the delayed gradient descent   under the strongly convexity and the Polyak-{\L}ojasiewicz condition

Hyung Jun Choi; Woocheol Choi; Jinmyoung Seok

arXiv:2308.11984·math.OC·February 23, 2024

Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak-{\L}ojasiewicz condition

Hyung Jun Choi, Woocheol Choi, Jinmyoung Seok

PDF

Open Access

TL;DR

This paper proves non-ergodic linear convergence of delayed gradient descent for strongly convex and PL-condition functions, extending previous results with weaker assumptions and larger learning rates, including stochastic variants.

Contribution

It establishes non-ergodic linear convergence rates for delayed gradient descent under weaker conditions and larger learning rates, also extending to stochastic gradient descent with delays.

Findings

01

Linear convergence for delayed gradient descent under strong convexity.

02

Extended convergence results under Polyak-{ extL}ojasiewicz condition.

03

Numerical experiments confirming theoretical results.

Abstract

In this work, we establish the linear convergence estimate for the gradient descent involving the delay $τ \in N$ when the cost function is $μ$ -strongly convex and $L$ -smooth. This result improves upon the well-known estimates in Arjevani et al. \cite{ASS} and Stich-Karmireddy \cite{SK} in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate $η$ can be extended from $η \leq 1/ (10 Lτ)$ to $η \leq 1/ (4 Lτ)$ for $τ = 1$ and $η \leq 3/ (10 Lτ)$ for $τ \geq 2$ , where $L > 0$ is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak-{\L}ojasiewicz\,(PL) condition, for which the available choice of learning rate is further improved as $η \leq 9/ (10 Lτ)$ for the large delay…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Numerical methods in inverse problems