QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Dan Alistarh; Demjan Grubic; Jerry Li; Ryota Tomioka; Milan Vojnovic

arXiv:1610.02132·cs.LG·December 7, 2017·909 cites

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, Milan Vojnovic

PDF

Open Access 2 Repos

TL;DR

QSGD introduces a provably convergent gradient quantization method that reduces communication costs in distributed SGD, enabling faster training of deep neural networks without sacrificing accuracy.

Contribution

It proposes a family of gradient compression schemes with theoretical convergence guarantees and demonstrates practical efficiency in training deep neural networks.

Findings

01

Significant reduction in communication cost during training.

02

Training speedup of 1.8x for ResNet-152 on ImageNet.

03

Maintains or slightly improves accuracy with quantization.

Abstract

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Consequently, lossy compression heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always provably converge, and it is not clear whether they are optimal. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes which allow the compression of gradient updates at each node, while guaranteeing convergence under standard assumptions. QSGD allows the user to trade off compression and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsStochastic Gradient Descent