Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak-{\L}ojasiewicz condition
Hyung Jun Choi, Woocheol Choi, Jinmyoung Seok

TL;DR
This paper proves non-ergodic linear convergence of delayed gradient descent for strongly convex and PL-condition functions, extending previous results with weaker assumptions and larger learning rates, including stochastic variants.
Contribution
It establishes non-ergodic linear convergence rates for delayed gradient descent under weaker conditions and larger learning rates, also extending to stochastic gradient descent with delays.
Findings
Linear convergence for delayed gradient descent under strong convexity.
Extended convergence results under Polyak-{ extL}ojasiewicz condition.
Numerical experiments confirming theoretical results.
Abstract
In this work, we establish the linear convergence estimate for the gradient descent involving the delay when the cost function is -strongly convex and -smooth. This result improves upon the well-known estimates in Arjevani et al. \cite{ASS} and Stich-Karmireddy \cite{SK} in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate can be extended from to for and for , where is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak-{\L}ojasiewicz\,(PL) condition, for which the available choice of learning rate is further improved as for the large delay…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Numerical methods in inverse problems
