RedSync : Reducing Synchronization Traffic for Distributed Deep Learning

Jiarui Fang; Haohuan Fu; Guangwen Yang; Cho-Jui Hsieh

arXiv:1808.04357·cs.DC·July 23, 2019

RedSync : Reducing Synchronization Traffic for Distributed Deep Learning

Jiarui Fang, Haohuan Fu, Guangwen Yang, Cho-Jui Hsieh

PDF

TL;DR

RedSync is a system that optimizes gradient communication in distributed deep learning, significantly reducing training time by implementing efficient residual gradient compression techniques on multi-GPU systems.

Contribution

This paper introduces RedSync, a system that applies optimized residual gradient compression to improve the efficiency of distributed DNN training across multiple GPUs.

Findings

01

RedSync reduces communication bandwidth significantly.

02

It improves training time on multi-GPU systems.

03

Effective for high communication-to-computation ratio models.

Abstract

Data parallelism has become a dominant method to scale Deep Neural Network (DNN) training across multiple nodes. Since synchronizing a large number of gradients of the local model can be a bottleneck for large-scale distributed training, compressing communication data has gained widespread attention recently. Among several recent proposed compression algorithms, Residual Gradient Compression (RGC) is one of the most successful approaches---it can significantly compress the transmitting message size (0.1\% of the gradient size) of each node and still achieve correct accuracy and the same convergence speed. However, the literature on compressing deep networks focuses almost exclusively on achieving good theoretical compression rate, while the efficiency of RGC in real distributed implementation has been less investigated. In this paper, we develop an RGC-based system that is able to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.