Double Quantization for Communication-Efficient Distributed Optimization

Yue Yu; Jiaxiang Wu; Longbo Huang

arXiv:1805.10111·math.OC·May 28, 2019·20 cites

Double Quantization for Communication-Efficient Distributed Optimization

Yue Yu, Jiaxiang Wu, Longbo Huang

PDF

Open Access

TL;DR

This paper introduces double quantization, a method to reduce communication costs in distributed machine learning by quantizing both model parameters and gradients, with algorithms that maintain performance while saving bits.

Contribution

It proposes a general double quantization scheme and three novel algorithms, including asynchronous, sparsified, and accelerated variants, with theoretical guarantees and practical validation.

Findings

01

Significant reduction in communication bits without performance loss.

02

Effective integration of gradient sparsification with double quantization.

03

Algorithms demonstrate practical efficiency on multi-server setups.

Abstract

Modern distributed training of machine learning models suffers from high communication overhead for synchronizing stochastic gradients and model parameters. In this paper, to reduce the communication complexity, we propose \emph{double quantization}, a general scheme for quantizing both model parameters and gradients. Three communication-efficient algorithms are proposed under this general scheme. Specifically, (i) we propose a low-precision algorithm AsyLPG with asynchronous parallelism, (ii) we explore integrating gradient sparsification with double quantization and develop Sparse-AsyLPG, (iii) we show that double quantization can also be accelerated by momentum technique and design accelerated AsyLPG. We establish rigorous performance guarantees for the algorithms, and conduct experiments on a multi-server test-bed to demonstrate that our algorithms can effectively save transmitted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data

MethodsGradient Sparsification