Integrated Model, Batch and Domain Parallelism in Training Neural Networks
Amir Gholami, Ariful Azad, Peter Jin, Kurt Keutzer, Aydin Buluc

TL;DR
This paper introduces an integrated parallelism approach combining model, batch, and domain parallelism for efficient training of deep neural networks on large distributed systems, reducing communication costs.
Contribution
It presents a novel matrix-based integrated parallelism method that automatically combines different parallelism strategies and extends the scalability of batch parallel training.
Findings
Lower communication costs compared to pure parallelism methods
Enhanced scalability of neural network training
Analytical demonstration of efficiency improvements
Abstract
We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using processes. Our method is inspired by the communication-avoiding algorithms in numerical linear algebra. We see processes as logically divided into a grid where the dimension is implicitly responsible for model/domain parallelism and the dimension is implicitly responsible for batch parallelism. In practice, the integrated matrix-based parallel algorithm encapsulates these types of parallelism automatically. We analyze the communication complexity and analytically demonstrate that the lowest communication costs are often achieved neither with pure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications
