Asynchronous Stochastic Optimization Robust to Arbitrary Delays

Alon Cohen; Amit Daniely; Yoel Drori; Tomer Koren; Mariano Schain

arXiv:2106.11879·math.OC·November 16, 2021·1 cites

Asynchronous Stochastic Optimization Robust to Arbitrary Delays

Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

PDF

Open Access 1 Video

TL;DR

This paper introduces an efficient stochastic optimization algorithm that is robust to arbitrary and variable delays in gradient updates, improving convergence guarantees over previous methods that depended on maximum delay.

Contribution

The paper presents a simple, efficient algorithm for non-convex stochastic optimization that depends on average delay, not maximum delay, enhancing robustness in asynchronous distributed systems.

Findings

01

Algorithm achieves $O( rac{\sigma^2}{\epsilon^4} + rac{ au}{\epsilon^2} )$ steps for $\epsilon$-stationary points.

02

Outperforms previous methods by depending on average delay $ au$ instead of maximum delay.

03

Demonstrates robustness in experiments with skewed and heavy-tailed delay distributions.

Abstract

We consider stochastic optimization with delayed gradients where, at each time step $t$ , the algorithm makes an update using a stale stochastic gradient from step $t - d_{t}$ for some arbitrary delay $d_{t}$ . This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communication loads that might vary significantly over time. In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O (σ^{2} / ϵ^{4} + τ / ϵ^{2})$ steps for finding an $ϵ$ -stationary point $x$ , where $τ$ is the \emph{average} delay $\frac{1}{T} \sum_{t = 1}^{T} d_{t}$ and $σ^{2}$ is the variance of the stochastic gradients. This improves over previous work, which showed that stochastic gradient decent achieves the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Asynchronous Stochastic Optimization Robust to Arbitrary Delays· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data