Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks
Zhihao Jia, Sina Lin, Charles R. Qi, Alex Aiken

TL;DR
This paper introduces layer-wise parallelism for training deep CNNs, enabling each layer to use its optimal parallelization strategy, which improves training efficiency and scalability without sacrificing accuracy.
Contribution
It proposes a novel layer-wise parallelism approach that optimizes parallelization strategies per layer through graph search, outperforming existing methods.
Findings
Increases training throughput compared to state-of-the-art methods.
Reduces communication costs during distributed training.
Achieves better scalability to multiple GPUs while maintaining accuracy.
Abstract
The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and Data Classification
