Efficient Communications in Training Large Scale Neural Networks

Linnan Wang; Wei Wu; George Bosilca; Richard Vuduc; Zenglin Xu

arXiv:1611.04255·cs.DC·April 18, 2017·5 cites

Efficient Communications in Training Large Scale Neural Networks

Linnan Wang, Wei Wu, George Bosilca, Richard Vuduc, Zenglin Xu

PDF

Open Access

TL;DR

This paper introduces Linear Pipelining, a new collective communication technique that significantly reduces communication costs in large-scale neural network training, enabling faster and more scalable parallel training on multi-GPU systems.

Contribution

The paper presents Linear Pipelining, a novel collective operation optimized for BSP-SGD, with theoretical and practical advantages over existing methods, improving scalability and bandwidth efficiency.

Findings

01

LP has cost invariant to number of GPUs P

02

LP achieves up to 2x bandwidth speedup over BE techniques

03

Applying LP to BSP-SGD reduces communication bottlenecks in practice

Abstract

We consider the problem of how to reduce the cost of communication that is required for the parallel training of a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective communication operations, like broadcasts of parameters or reductions for sub-gradient aggregations, which for large messages quickly dominates overall execution time and limits parallel scalability. To address this problem, we develop a new technique for collective operations, referred to as Linear Pipelining (LP). It is tuned to the message sizes that arise in BSP-SGD, and works effectively on multi-GPU systems. Theoretically, the cost of LP is invariant to $P$ , where $P$ is the number of GPUs, while the cost of more conventional Minimum Spanning Tree (MST) scales like $O (lo g P)$ . LP also demonstrate up to 2x faster bandwidth than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications