PaSE: Parallelization Strategies for Efficient DNN Training
Venmugil Elango

TL;DR
This paper introduces an automated method to discover efficient parallelization strategies for deep neural network training, outperforming traditional data parallelism and expert-designed methods across various models.
Contribution
It proposes a novel algorithm that automatically finds optimal parallelization strategies from computation graphs, improving training efficiency and generalizability.
Findings
Strategies outperform data parallelism in all tested cases.
Our approach yields better performance than expert-designed and state-of-the-art strategies.
The algorithm computes strategies efficiently within practical time.
Abstract
Training a deep neural network (DNN) requires substantial computational and memory requirements. It is common to use multiple devices to train a DNN to reduce the overall training time. There are several choices to parallelize each layer in a DNN. Exhaustively searching this list to find an optimal parallelization strategy is prohibitively time consuming and impractical. The standard practice is to use data parallelism because of its simplicity. However, data parallelism is often sub-optimal, and suffers from poor performance and high memory requirement. Expert-designed strategies have been proposed on a case-by-case basis using domain specific knowledge. These expert-designed strategies do not generalize well to DNNs other than the ones for which they were designed, and are not always necessarily the best choice. In this paper, we propose an approach to automatically find efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
