Distributed Sparse SGD with Majority Voting

Kerem Ozfatura; Emre Ozfatura; Deniz Gunduz

arXiv:2011.06495·cs.LG·November 13, 2020·1 cites

Distributed Sparse SGD with Majority Voting

Kerem Ozfatura, Emre Ozfatura, Deniz Gunduz

PDF

Open Access

TL;DR

This paper introduces a majority voting based sparse communication strategy for distributed SGD, significantly reducing communication load while maintaining accuracy, demonstrated through extensive CIFAR-10 simulations.

Contribution

The novel majority voting approach aligns sparsity patterns among workers, enabling high compression rates without accuracy loss in distributed training.

Findings

01

Achieves up to 4000x compression with no accuracy loss

02

Reduces communication load in distributed SGD

03

Maintains consistent sparsity patterns across workers

Abstract

Distributed learning, particularly variants of distributed stochastic gradient descent (DSGD), are widely employed to speed up training by leveraging computational resources of several workers. However, in practise, communication delay becomes a bottleneck due to the significant amount of information that needs to be exchanged between the workers and the parameter server. One of the most efficient strategies to mitigate the communication bottleneck is top-K sparsification. However, top-K sparsification requires additional communication load to represent the sparsity pattern, and the mismatch between the sparsity patterns of the workers prevents exploitation of efficient communication protocols. To address these issues, we introduce a novel majority voting based sparse communication strategy, in which the workers first seek a consensus on the structure of the sparse representation. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning