signSGD with Majority Vote is Communication Efficient And Fault Tolerant

Jeremy Bernstein; Jiawei Zhao; Kamyar Azizzadenesheli; Anima; Anandkumar

arXiv:1810.05291·cs.DC·February 26, 2019·125 cites

signSGD with Majority Vote is Communication Efficient And Fault Tolerant

Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, Anima, Anandkumar

PDF

Open Access 4 Repos

TL;DR

This paper introduces signSGD with majority vote, a communication-efficient and fault-tolerant distributed training algorithm for neural networks, demonstrating theoretical convergence and practical speedups over existing methods.

Contribution

The paper proves convergence of signSGD with majority vote under natural conditions and shows its robustness against adversarial worker behavior, with practical implementation and benchmarking.

Findings

01

Uses 32x less communication than full-precision SGD.

02

Achieves 25% faster training of ResNet50 on ImageNet with 15 machines.

03

Proves robustness of majority vote against up to 50% adversarial workers.

Abstract

Training neural networks on large datasets can be accelerated by distributing the workload over a network of machines. As datasets grow ever larger, networks of hundreds or thousands of machines become economically viable. The time cost of communicating gradients limits the effectiveness of using such large machine counts, as may the increased chance of network faults. We explore a particularly simple algorithm for robust, communication-efficient learning---signSGD. Workers transmit only the sign of their gradient vector to a server, and the overall update is decided by a majority vote. This algorithm uses $32 \times$ less communication per iteration than full-precision, distributed SGD. Under natural conditions verified by experiment, we prove that signSGD converges in the large and mini-batch settings, establishing convergence for a parameter regime of Adam as a byproduct. Aggregating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis

MethodsStochastic Gradient Descent · Adam