signSGD with Majority Vote is Communication Efficient And Fault Tolerant
Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, Anima, Anandkumar

TL;DR
This paper introduces signSGD with majority vote, a communication-efficient and fault-tolerant distributed training algorithm for neural networks, demonstrating theoretical convergence and practical speedups over existing methods.
Contribution
The paper proves convergence of signSGD with majority vote under natural conditions and shows its robustness against adversarial worker behavior, with practical implementation and benchmarking.
Findings
Uses 32x less communication than full-precision SGD.
Achieves 25% faster training of ResNet50 on ImageNet with 15 machines.
Proves robustness of majority vote against up to 50% adversarial workers.
Abstract
Training neural networks on large datasets can be accelerated by distributing the workload over a network of machines. As datasets grow ever larger, networks of hundreds or thousands of machines become economically viable. The time cost of communicating gradients limits the effectiveness of using such large machine counts, as may the increased chance of network faults. We explore a particularly simple algorithm for robust, communication-efficient learning---signSGD. Workers transmit only the sign of their gradient vector to a server, and the overall update is decided by a majority vote. This algorithm uses less communication per iteration than full-precision, distributed SGD. Under natural conditions verified by experiment, we prove that signSGD converges in the large and mini-batch settings, establishing convergence for a parameter regime of Adam as a byproduct. Aggregating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis
MethodsStochastic Gradient Descent · Adam
