RapidGNN: Communication Efficient Large-Scale Distributed Training of Graph Neural Networks
Arefin Niam, M S Q Zulkar Nine

TL;DR
RapidGNN introduces a deterministic sampling strategy that precomputes mini-batches, enabling efficient cache and prefetching mechanisms to significantly reduce communication overhead and improve training throughput for large-scale GNNs.
Contribution
It proposes a novel deterministic sampling approach that anticipates feature access patterns, optimizing communication and caching in distributed GNN training.
Findings
Achieves 2.10x faster training throughput on average
Reduces remote feature fetches by over 4x
Cuts energy consumption by up to 23%
Abstract
Graph Neural Networks (GNNs) have achieved state-of-the-art (SOTA) performance in diverse domains. However, training GNNs on large-scale graphs poses significant challenges due to high memory demands and significant communication overhead in distributed settings. Traditional sampling-based approaches mitigate computation load to some extent but often fail to address communication inefficiencies inherent in distributed environments. This paper presents RapidGNN that introduces a deterministic sampling strategy to precompute mini-batches. By leveraging the sampling strategy, RapidGNN accurately anticipates feature access patterns, enabling optimal cache construction and timely prefetching of remote features. This reduces the frequency and latency of remote data transfers without compromising the stochastic nature of training. Evaluations on Reddit and OGBN-Products datasets demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Big Data and Digital Economy
