signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima, Anandkumar

TL;DR
signSGD is a gradient compression method that transmits only the sign of gradients, achieving communication efficiency while maintaining convergence rates similar to standard SGD, with theoretical guarantees and practical success on deep learning models.
Contribution
This paper introduces signSGD with theoretical convergence guarantees and extends it to distributed settings using majority vote for 1-bit communication.
Findings
signSGD matches SGD convergence rates
Momentum signSGD achieves Adam-level accuracy
Majority vote enables 1-bit gradient compression in distributed training
Abstract
Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative geometry of gradients, noise and curvature informs whether signSGD or SGD is theoretically better suited to a particular problem. On the practical side we find that the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adam · Stochastic Gradient Descent
