Efficient Communications in Training Large Scale Neural Networks
Linnan Wang, Wei Wu, George Bosilca, Richard Vuduc, Zenglin Xu

TL;DR
This paper introduces Linear Pipelining, a new collective communication technique that significantly reduces communication costs in large-scale neural network training, enabling faster and more scalable parallel training on multi-GPU systems.
Contribution
The paper presents Linear Pipelining, a novel collective operation optimized for BSP-SGD, with theoretical and practical advantages over existing methods, improving scalability and bandwidth efficiency.
Findings
LP has cost invariant to number of GPUs P
LP achieves up to 2x bandwidth speedup over BE techniques
Applying LP to BSP-SGD reduces communication bottlenecks in practice
Abstract
We consider the problem of how to reduce the cost of communication that is required for the parallel training of a neural network. The state-of-the-art method, Bulk Synchronous Parallel Stochastic Gradient Descent (BSP-SGD), requires many collective communication operations, like broadcasts of parameters or reductions for sub-gradient aggregations, which for large messages quickly dominates overall execution time and limits parallel scalability. To address this problem, we develop a new technique for collective operations, referred to as Linear Pipelining (LP). It is tuned to the message sizes that arise in BSP-SGD, and works effectively on multi-GPU systems. Theoretically, the cost of LP is invariant to , where is the number of GPUs, while the cost of more conventional Minimum Spanning Tree (MST) scales like . LP also demonstrate up to 2x faster bandwidth than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications
