Accelerate Model Parallel Training by Using Efficient Graph Traversal   Order in Device Placement

Tianze Wang; Amir H. Payberah; Desta Haileselassie Hagos; Vladimir; Vlassov

arXiv:2201.09676·cs.LG·October 28, 2024

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Tianze Wang, Amir H. Payberah, Desta Haileselassie Hagos, Vladimir, Vlassov

PDF

Open Access 1 Repo

TL;DR

This paper investigates how the order of graph traversal in device placement affects training speed in model parallel neural network training, providing empirical insights and recommendations for different network types.

Contribution

It studies the impact of traversal order on device placement and offers practical guidelines to optimize training time across neural network architectures.

Findings

01

Traversal order significantly influences training time.

02

Optimal traversal depends on network type and graph features.

03

Recommendations improve model parallel training efficiency.

Abstract

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decision-making by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of graph traversal order on device placement. In particular, we empirically study how different graph traversal order leads to different device placement, which in turn affects the training execution time. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bwhub/Graph_Traversal_Order_in_Device_Placement
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Advanced Graph Neural Networks

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings