Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, Ji Liu

TL;DR
This paper investigates whether decentralized parallel stochastic gradient descent algorithms can outperform centralized ones by reducing communication costs, supported by theoretical analysis and extensive empirical validation across multiple platforms and network conditions.
Contribution
The paper provides the first theoretical analysis showing regimes where decentralized algorithms can outperform centralized algorithms in distributed stochastic gradient descent.
Findings
D-PSGD can be up to ten times faster than C-PSGD in low bandwidth or high latency networks.
Decentralized algorithms have comparable computational complexity but lower communication costs.
Empirical validation across CNTK, Torch, and multiple GPU configurations supports the theoretical results.
Abstract
Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart? Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. In this paper, we study a D-PSGD algorithm and provide the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. This is because D-PSGD has comparable total computational complexities to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Age of Information Optimization
