DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining
Lin Zhang, Shaohuai Shi, Xiaowen Chu, Wei Wang, Bo Li, Chengjian Liu

TL;DR
DeAR introduces a novel communication scheduling algorithm that decouples all-reduce operations, enabling better overlap with computations and significantly accelerating distributed deep learning training.
Contribution
It proposes a new scheduling algorithm that decouples all-reduce into two parts and a tensor fusion method, reducing latency and improving training speed.
Findings
Achieves up to 83% training speedup on Ethernet interconnects.
Achieves up to 15% training speedup on InfiniBand interconnects.
Effectively overlaps all-reduce with both backpropagation and feed-forward computations.
Abstract
Communication scheduling has been shown to be effective in accelerating distributed training, which enables all-reduce communications to be overlapped with backpropagation computations. This has been commonly adopted in popular distributed deep learning frameworks. However, there exist two fundamental problems: (1) excessive startup latency proportional to the number of workers for each all-reduce operation; (2) it only achieves sub-optimal training performance due to the dependency and synchronization requirement of the feed-forward computation in the next iteration. We propose a novel scheduling algorithm, DeAR, that decouples the all-reduce primitive into two continuous operations, which overlaps with both backpropagation and feed-forward computations without extra communications. We further design a practical tensor fusion algorithm to improve the training performance. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Brain Tumor Detection and Classification
