Characterizing and Understanding Distributed GNN Training on GPUs
Haiyang Lin, Mingyu Yan, Xiaocheng Yang, Mo Zou, Wenming Li, Xiaochun, Ye, Dongrui Fan

TL;DR
This paper provides an in-depth analysis of distributed GNN training on GPUs, revealing key insights and guidelines to optimize performance for large-scale graph neural network training.
Contribution
It offers the first comprehensive analysis of distributed GNN training on GPUs, highlighting performance bottlenecks and optimization strategies.
Findings
Identifies key performance bottlenecks in distributed GNN training on GPUs.
Provides practical guidelines for software and hardware optimization.
Enhances understanding of distributed GNN training execution on GPU clusters.
Abstract
Graph neural network (GNN) has been demonstrated to be a powerful model in many domains for its effectiveness in learning over graphs. To scale GNN training for large graphs, a widely adopted approach is distributed training which accelerates training using multiple computing nodes. Maximizing the performance is essential, but the execution of distributed GNN training remains preliminarily understood. In this work, we provide an in-depth analysis of distributed GNN training on GPUs, revealing several significant observations and providing useful guidelines for both software optimization and hardware optimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Ferroelectric and Negative Capacitance Devices · Machine Learning in Materials Science
