RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Jiarui Fang, Haohuan Fu, Guangwen Yang, Cho-Jui Hsieh

TL;DR
RedSync is a system that optimizes gradient communication in distributed deep learning, significantly reducing training time by implementing efficient residual gradient compression techniques on multi-GPU systems.
Contribution
This paper introduces RedSync, a system that applies optimized residual gradient compression to improve the efficiency of distributed DNN training across multiple GPUs.
Findings
RedSync reduces communication bandwidth significantly.
It improves training time on multi-GPU systems.
Effective for high communication-to-computation ratio models.
Abstract
Data parallelism has become a dominant method to scale Deep Neural Network (DNN) training across multiple nodes. Since synchronizing a large number of gradients of the local model can be a bottleneck for large-scale distributed training, compressing communication data has gained widespread attention recently. Among several recent proposed compression algorithms, Residual Gradient Compression (RGC) is one of the most successful approaches---it can significantly compress the transmitting message size (0.1\% of the gradient size) of each node and still achieve correct accuracy and the same convergence speed. However, the literature on compressing deep networks focuses almost exclusively on achieving good theoretical compression rate, while the efficiency of RGC in real distributed implementation has been less investigated. In this paper, we develop an RGC-based system that is able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
