Distributed Delayed Stochastic Optimization

Alekh Agarwal; John C. Duchi

arXiv:1104.5525·math.OC·May 2, 2011·NeurIPS·105 cites

Distributed Delayed Stochastic Optimization

Alekh Agarwal, John C. Duchi

PDF

Open Access

TL;DR

This paper analyzes the convergence of delayed stochastic gradient algorithms in distributed settings, showing delays are negligible for smooth problems and proposing methods to overcome communication bottlenecks, achieving optimal convergence rates.

Contribution

It demonstrates that delays in distributed stochastic optimization are asymptotically negligible for smooth problems and develops algorithms that attain optimal convergence despite asynchrony.

Findings

01

Delays do not affect asymptotic convergence in smooth stochastic problems.

02

Distributed algorithms can achieve the optimal rate of 1/√(nT) despite delays.

03

Proposed methods improve communication efficiency in distributed optimization.

Abstract

We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. In application to distributed optimization, we develop procedures that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Sparse and Compressive Sensing Techniques