Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs
Jialin Dong, Da Zheng, Lin F. Yang, Geroge Karypis

TL;DR
This paper introduces Global Neighborhood Sampling, a novel method for efficient GNN training on large-scale graphs using mixed CPU-GPU setups, reducing data transfer and improving performance without sacrificing accuracy.
Contribution
It proposes a global cache-based sampling algorithm tailored for mixed-CPU-GPU training on giant graphs, with an efficient implementation and theoretical convergence guarantees.
Findings
Outperforms baseline neighbor sampling by 2X-4X on giant graphs.
Outperforms LADIES by 2X-14X while maintaining higher accuracy.
Achieves comparable convergence rate to traditional methods.
Abstract
Graph neural networks (GNNs) are powerful tools for learning from graph data and are widely used in various applications such as social network recommendation, fraud detection, and graph search. The graphs in these applications are typically large, usually containing hundreds of millions of nodes. Training GNN models on such large graphs efficiently remains a big challenge. Despite a number of sampling-based methods have been proposed to enable mini-batch training on large graphs, these methods have not been proved to work on truly industry-scale graphs, which require GPUs or mixed-CPU-GPU training. The state-of-the-art sampling-based methods are usually not optimized for these real-world hardware setups, in which data movement between CPUs and GPUs is a bottleneck. To address this issue, we propose Global Neighborhood Sampling that aims at training GNNs on giant graphs specifically for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Stochastic Gradient Optimization Techniques
