Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement
Tianze Wang, Amir H. Payberah, Desta Haileselassie Hagos, Vladimir, Vlassov

TL;DR
This paper investigates how the order of graph traversal in device placement affects training speed in model parallel neural network training, providing empirical insights and recommendations for different network types.
Contribution
It studies the impact of traversal order on device placement and offers practical guidelines to optimize training time across neural network architectures.
Findings
Traversal order significantly influences training time.
Optimal traversal depends on network type and graph features.
Recommendations improve model parallel training efficiency.
Abstract
Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decision-making by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of graph traversal order on device placement. In particular, we empirically study how different graph traversal order leads to different device placement, which in turn affects the training execution time. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Advanced Graph Neural Networks
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
