CD-SGD: Distributed Stochastic Gradient Descent with Compression and   Delay Compensation

Enda Yu; Dezun Dong; Yemao Xu; Shuo Ouyang; Xiangke Liao

arXiv:2106.10796·cs.LG·September 8, 2021·1 cites

CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation

Enda Yu, Dezun Dong, Yemao Xu, Shuo Ouyang, Xiangke Liao

PDF

Open Access

TL;DR

This paper introduces CD-SGD, a distributed stochastic gradient descent method that incorporates gradient compression with delay compensation to reduce communication overhead while maintaining convergence accuracy.

Contribution

The paper proposes a novel CD-SGD algorithm that effectively combines gradient compression with delay compensation to improve distributed training efficiency.

Findings

01

Reduces communication overhead in distributed training

02

Maintains convergence accuracy despite gradient compression

03

Demonstrates improved training speed in experiments

Abstract

Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient compression technique can greatly alleviate the impact of communication overhead. However, there exists two problems of gradient compression technique to be solved. Firstly, gradient compression brings in extra computation cost, which will delay the next training iteration. Secondly, gradient compression usually leads to the decrease of convergence accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques