Distributed Sparse SGD with Majority Voting
Kerem Ozfatura, Emre Ozfatura, Deniz Gunduz

TL;DR
This paper introduces a majority voting based sparse communication strategy for distributed SGD, significantly reducing communication load while maintaining accuracy, demonstrated through extensive CIFAR-10 simulations.
Contribution
The novel majority voting approach aligns sparsity patterns among workers, enabling high compression rates without accuracy loss in distributed training.
Findings
Achieves up to 4000x compression with no accuracy loss
Reduces communication load in distributed SGD
Maintains consistent sparsity patterns across workers
Abstract
Distributed learning, particularly variants of distributed stochastic gradient descent (DSGD), are widely employed to speed up training by leveraging computational resources of several workers. However, in practise, communication delay becomes a bottleneck due to the significant amount of information that needs to be exchanged between the workers and the parameter server. One of the most efficient strategies to mitigate the communication bottleneck is top-K sparsification. However, top-K sparsification requires additional communication load to represent the sparsity pattern, and the mismatch between the sparsity patterns of the workers prevents exploitation of efficient communication protocols. To address these issues, we introduce a novel majority voting based sparse communication strategy, in which the workers first seek a consensus on the structure of the sparse representation. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
