Integrated Model, Batch and Domain Parallelism in Training Neural   Networks

Amir Gholami; Ariful Azad; Peter Jin; Kurt Keutzer; Aydin Buluc

arXiv:1712.04432·cs.LG·May 17, 2018

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Amir Gholami, Ariful Azad, Peter Jin, Kurt Keutzer, Aydin Buluc

PDF

Open Access

TL;DR

This paper introduces an integrated parallelism approach combining model, batch, and domain parallelism for efficient training of deep neural networks on large distributed systems, reducing communication costs.

Contribution

It presents a novel matrix-based integrated parallelism method that automatically combines different parallelism strategies and extends the scalability of batch parallel training.

Findings

01

Lower communication costs compared to pure parallelism methods

02

Enhanced scalability of neural network training

03

Analytical demonstration of efficiency improvements

Abstract

We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using $P$ processes. Our method is inspired by the communication-avoiding algorithms in numerical linear algebra. We see $P$ processes as logically divided into a $P_{r} \times P_{c}$ grid where the $P_{r}$ dimension is implicitly responsible for model/domain parallelism and the $P_{c}$ dimension is implicitly responsible for batch parallelism. In practice, the integrated matrix-based parallel algorithm encapsulates these types of parallelism automatically. We analyze the communication complexity and analytically demonstrate that the lowest communication costs are often achieved neither with pure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications