Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback
Shuai Zheng, Ziyue Huang, James T. Kwok

TL;DR
This paper introduces a communication-efficient distributed momentum SGD method using blockwise gradient compression with error feedback, achieving significant communication reduction without sacrificing convergence or accuracy.
Contribution
It proposes a novel blockwise gradient compression technique with error feedback for distributed momentum SGD, providing convergence guarantees for nonconvex problems.
Findings
Achieves 32x reduction in communication cost.
Converges as fast as full-precision SGD in experiments.
Reduces training time by 46% on ImageNet with ResNet.
Abstract
Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
