Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training

Cunyang Wei; Siddharth Singh; Aishwarya Sarkar; Daniel Nichols; Tisha Patel; Aditya K. Ranjan; Sayan Ghosh; Ali Jannesari; Nathan R. Tallent; Abhinav Bhatele

arXiv:2604.02651·cs.LG·April 6, 2026

Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training

Cunyang Wei, Siddharth Singh, Aishwarya Sarkar, Daniel Nichols, Tisha Patel, Aditya K. Ranjan, Sayan Ghosh, Ali Jannesari, Nathan R. Tallent, Abhinav Bhatele

PDF

TL;DR

ScaleGNN introduces a communication-free sampling method and 4D parallelism to significantly improve the scalability and efficiency of mini-batch GNN training on large GPU clusters.

Contribution

The paper proposes a novel 4D parallel framework combining communication-free sampling and 3D matrix multiplication for scalable GNN training.

Findings

01

Achieves 3.5x speedup over SOTA on ogbn-products.

02

Scales to 2048 GPUs with strong performance.

03

Reduces communication overhead significantly.

Abstract

Graph neural networks (GNNs) are widely used for learning on graph datasets derived from various real-world scenarios. Learning from extremely large graphs requires distributed training, and mini-batching with sampling is a popular approach for parallelizing GNN training. Existing distributed mini-batch approaches have significant performance bottlenecks due to expensive sampling methods and limited scaling when using data parallelism. In this work, we present ScaleGNN, a 4D parallel framework for scalable mini-batch GNN training that combines communication-free distributed sampling, 3D parallel matrix multiplication (PMM), and data parallelism. ScaleGNN introduces a uniform vertex sampling algorithm, enabling each process (GPU device) to construct its local mini-batch, i.e., subgraph partitions without any inter-process communication. 3D PMM enables scaling mini-batch training to much…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.