Nested Dithered Quantization for Communication Reduction in Distributed Training
Afshin Abdi, and Faramarz Fekri

TL;DR
This paper introduces nested dithered quantization techniques for distributed training that significantly reduce communication overhead while maintaining model accuracy, by leveraging the properties of dithered quantization and inter-worker gradient correlations.
Contribution
It proposes Dithered Quantized Stochastic Gradients (DQSG) and nested dithered quantization (NDQSG), providing convergence analysis and demonstrating reduced communication with no accuracy loss.
Findings
DQSG behaves like unperturbed gradients with bounded noise.
NDQSG reduces communication bits without extra worker communication.
Simulation shows comparable accuracy with fewer bits or faster convergence.
Abstract
In distributed training, the communication cost due to the transmission of gradients or the parameters of the deep model is a major bottleneck in scaling up the number of processing nodes. To address this issue, we propose \emph{dithered quantization} for the transmission of the stochastic gradients and show that training with \emph{Dithered Quantized Stochastic Gradients (DQSG)} is similar to the training with unquantized SGs perturbed by an independent bounded uniform noise, in contrast to the other quantization methods where the perturbation depends on the gradients and hence, complicating the convergence analysis. We study the convergence of training algorithms using DQSG and the trade off between the number of quantization levels and the training time. Next, we observe that there is a correlation among the SGs computed by workers that can be utilized to further reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
