Communication-Efficient Distributed Blockwise Momentum SGD with   Error-Feedback

Shuai Zheng; Ziyue Huang; James T. Kwok

arXiv:1905.10936·cs.LG·October 29, 2019·41 cites

Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

Shuai Zheng, Ziyue Huang, James T. Kwok

PDF

Open Access 1 Repo

TL;DR

This paper introduces a communication-efficient distributed momentum SGD method using blockwise gradient compression with error feedback, achieving significant communication reduction without sacrificing convergence or accuracy.

Contribution

It proposes a novel blockwise gradient compression technique with error feedback for distributed momentum SGD, providing convergence guarantees for nonconvex problems.

Findings

01

Achieves 32x reduction in communication cost.

02

Converges as fast as full-precision SGD in experiments.

03

Reduces training time by 46% on ImageNet with ResNet.

Abstract

Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum. We consider two-way compression, which compresses the gradients both to and from workers. Convergence analysis on nonconvex problems for general gradient compressors is provided. By partitioning the gradient into blocks, a blockwise compressor is introduced such that each gradient block is compressed and transmitted in 1-bit format with a scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZiyueHuang/dist-ef-sgdm
mxnetOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection