Communication-Censored Distributed Stochastic Gradient Descent
Weiyu Li, Tianyi Chen, Liping Li, Zhaoxian Wu, Qing Ling

TL;DR
This paper introduces a communication-censoring technique in distributed stochastic gradient descent to significantly reduce communication costs while maintaining convergence rates, benefiting large-scale distributed machine learning.
Contribution
The paper proposes a novel communication-censoring approach for distributed SGD that reduces transmissions by only communicating informative gradients, unlike existing quantization or sparsification methods.
Findings
CSGD achieves the same convergence rate as standard SGD.
CSGD reduces communication by selectively transmitting gradients.
Numerical experiments confirm substantial communication savings.
Abstract
This paper develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications such as distributed machine learning.Different from the existing works based on quantization and sparsification, we introduce a communication-censoring technique to reduce the transmissions of variables, which leads to our communication-Censored distributed Stochastic Gradient Descent (CSGD) algorithm. Specifically, in CSGD, the latest mini-batch stochastic gradient at a worker will be transmitted to the server if and only if it is sufficiently informative. When the latest gradient is not available, the stale one will be reused at the server. To implement this communication-censoring strategy, the batch-size is increasing in order to alleviate the effect of stochastic gradient noise.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
MethodsStochastic Gradient Descent
