NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism
Xin Ai, Hao Yuan, Zeyu Ling, Qiange Wang, Yanfeng Zhang, Zhenbo Fu,, Chaoyi Chen, Yu Gu, Ge Yu

TL;DR
NeutronTP introduces a tensor parallelism approach for distributed GNN training that balances workload and reduces communication overhead, enabling efficient training of large-scale graphs across multiple GPUs.
Contribution
The paper proposes a novel tensor parallelism method for distributed GNN training that eliminates cross-worker dependencies and improves load balancing and efficiency.
Findings
Achieves up to 8.72x speedup over existing systems
Effectively trains large graphs exceeding single GPU memory
Reduces communication overhead through decoupled training framework
Abstract
Graph neural networks (GNNs) have emerged as a promising direction. Training large-scale graphs that relies on distributed computing power poses new challenges. Existing distributed GNN systems leverage data parallelism by partitioning the input graph and distributing it to multiple workers. However, due to the irregular nature of the graph structure, existing distributed approaches suffer from unbalanced workloads and high overhead in managing cross-worker vertex dependencies. In this paper, we leverage tensor parallelism for distributed GNN training. GNN tensor parallelism eliminates cross-worker vertex dependencies by partitioning features instead of graph structures. Different workers are assigned training tasks on different feature slices with the same dimensional size, leading to a complete load balance. We achieve efficient GNN tensor parallelism through two critical functions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Advanced Neural Network Applications · Topic Modeling
