MG-WFBP: Merging Gradients Wisely for Efficient Communication in   Distributed Deep Learning

Shaohuai Shi; Xiaowen Chu; Bo Li

arXiv:1912.09268·cs.DC·January 19, 2021

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Shaohuai Shi, Xiaowen Chu, Bo Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces MG-WFBP, an optimized gradient merging method for distributed deep learning that reduces communication overhead and improves scalability, validated through extensive experiments on multiple GPU clusters.

Contribution

The paper proposes a novel gradient merging algorithm, MG-WFBP, that optimally combines short communication tasks to enhance distributed training efficiency.

Findings

01

MG-WFBP outperforms existing methods in scalability.

02

It effectively reduces communication time in distributed training.

03

Experimental results confirm improved training speed and scalability.

Abstract

Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap communications with computations during the training process. In this paper, we observe that many DNNs have a large number of layers with only a small amount of data to be communicated at each layer in distributed training, which could make WFBP inefficient. Based on the fact that merging some short communication tasks into a single one can reduce the overall communication time, we formulate an optimization problem to minimize the training time in pipelining communications and computations. We derive an optimal solution that can be solved efficiently without affecting the training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HKBU-HPML/MG-WFBP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM