GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism
Jingji Chen, Zhuoming Chen, Xuehai Qian

TL;DR
GNNPipe introduces a layer-level model parallelism approach for distributed GNN training, significantly reducing communication overhead and training time while maintaining model accuracy, and supports hybrid strategies for large graphs.
Contribution
First to apply layer-level model parallelism to GNN training, enabling scalable distributed training with reduced communication and improved efficiency.
Findings
Reduces per-epoch training time by up to 2.45x
Decreases communication volume by up to 22.89x
Maintains comparable model accuracy and convergence speed
Abstract
Communication is a key bottleneck for distributed graph neural network (GNN) training. This paper proposes GNNPipe, a new approach that scales the distributed full-graph deep GNN training. Being the first to use layer-level model parallelism for GNN training, GNNPipe partitions GNN layers among GPUs, each device performs the computation for a disjoint subset of consecutive GNN layers on the whole graph. Compared to graph parallelism with each GPU handling a graph partition, GNNPipe reduces the communication volume by a factor of the number of GNN layers. GNNPipe overcomes the unique challenges for pipelined layer-level model parallelism on the whole graph by partitioning it into dependent chunks, allowing the use of historical vertex embeddings, and applying specific training techniques to ensure convergence. We also propose a hybrid approach by combining GNNPipe with graph parallelism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
MethodsGraph Neural Network · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
