Double Quantization for Communication-Efficient Distributed Optimization
Yue Yu, Jiaxiang Wu, Longbo Huang

TL;DR
This paper introduces double quantization, a method to reduce communication costs in distributed machine learning by quantizing both model parameters and gradients, with algorithms that maintain performance while saving bits.
Contribution
It proposes a general double quantization scheme and three novel algorithms, including asynchronous, sparsified, and accelerated variants, with theoretical guarantees and practical validation.
Findings
Significant reduction in communication bits without performance loss.
Effective integration of gradient sparsification with double quantization.
Algorithms demonstrate practical efficiency on multi-server setups.
Abstract
Modern distributed training of machine learning models suffers from high communication overhead for synchronizing stochastic gradients and model parameters. In this paper, to reduce the communication complexity, we propose \emph{double quantization}, a general scheme for quantizing both model parameters and gradients. Three communication-efficient algorithms are proposed under this general scheme. Specifically, (i) we propose a low-precision algorithm AsyLPG with asynchronous parallelism, (ii) we explore integrating gradient sparsification with double quantization and develop Sparse-AsyLPG, (iii) we show that double quantization can also be accelerated by momentum technique and design accelerated AsyLPG. We establish rigorous performance guarantees for the algorithms, and conduct experiments on a multi-server test-bed to demonstrate that our algorithms can effectively save transmitted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
MethodsGradient Sparsification
