Moniqua: Modulo Quantized Communication in Decentralized SGD
Yucheng Lu, Christopher De Sa

TL;DR
Moniqua enables decentralized SGD to use highly efficient 1-bit quantized communication without sacrificing convergence speed or accuracy, reducing communication costs significantly.
Contribution
Moniqua introduces a provably efficient quantization method for decentralized SGD that requires no extra memory and works with 1-bit communication.
Findings
Moniqua converges faster in wall clock time than other quantized methods.
It maintains accuracy with 1-bit communication on ResNet models.
Theoretical bounds on communication bits per iteration are established.
Abstract
Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires zero additional memory, (2) works with 1-bit quantization, and (3) is applicable to a variety of decentralized algorithms. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing 1-bit-per-parameter communication without compromising validation accuracy when training ResNet20 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
