MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
Shaohuai Shi, Xiaowen Chu, Bo Li

TL;DR
This paper introduces MG-WFBP, an optimized gradient merging method for distributed deep learning that reduces communication overhead and improves scalability, validated through extensive experiments on multiple GPU clusters.
Contribution
The paper proposes a novel gradient merging algorithm, MG-WFBP, that optimally combines short communication tasks to enhance distributed training efficiency.
Findings
MG-WFBP outperforms existing methods in scalability.
It effectively reduces communication time in distributed training.
Experimental results confirm improved training speed and scalability.
Abstract
Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap communications with computations during the training process. In this paper, we observe that many DNNs have a large number of layers with only a small amount of data to be communicated at each layer in distributed training, which could make WFBP inefficient. Based on the fact that merging some short communication tasks into a single one can reduce the overall communication time, we formulate an optimization problem to minimize the training time in pipelining communications and computations. We derive an optimal solution that can be solved efficiently without affecting the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
