Asynchronous Decentralized SGD with Quantized and Local Updates
Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies, Shigang Li, Dan, Alistarh

TL;DR
This paper introduces SwarmSGD, an asynchronous decentralized optimization algorithm that combines quantization, local updates, and non-blocking communication, and proves its convergence under complex, realistic conditions.
Contribution
It provides the first convergence analysis of asynchronous decentralized SGD with quantization and local steps in heterogeneous, gossip-based settings.
Findings
SwarmSGD converges despite asynchronous, quantized, and local updates.
The algorithm outperforms previous decentralized methods in training time.
SwarmSGD can match large-batch SGD performance on certain tasks.
Abstract
Decentralized optimization is emerging as a viable alternative for scalable distributed machine learning, but also introduces new challenges in terms of synchronization costs. To this end, several communication-reduction techniques, such as non-blocking communication, quantization, and local steps, have been explored in the decentralized setting. Due to the complexity of analyzing optimization in such a relaxed setting, this line of work often assumes \emph{global} communication rounds, which require additional synchronization. In this paper, we consider decentralized optimization in the simpler, but harder to analyze, \emph{asynchronous gossip} model, in which communication occurs in discrete, randomly chosen pairings among nodes. Perhaps surprisingly, we show that a variant of SGD called \emph{SwarmSGD} still converges in this setting, even if \emph{non-blocking communication},…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Cooperative Communication and Network Coding
MethodsStochastic Gradient Descent
