TL;DR
Ginex is a novel SSD-based GNN training system that enables billion-scale graph training on a single machine by optimizing in-memory caching to reduce data movement bottlenecks.
Contribution
Ginex introduces a new training pipeline restructuring and a provably optimal caching algorithm for efficient single-machine billion-scale GNN training.
Findings
Achieves 2.11x higher throughput than SSD-extended PyTorch Geometric.
Effectively processes four billion-scale graph datasets on a single machine.
Demonstrates the effectiveness of optimal in-memory caching in SSD-based GNN training.
Abstract
Recently, Graph Neural Networks (GNNs) have been receiving a spotlight as a powerful tool that can effectively serve various inference tasks on graph structured data. As the size of real-world graphs continues to scale, the GNN training system faces a scalability challenge. Distributed training is a popular approach to address this challenge by scaling out CPU nodes. However, not much attention has been paid to disk-based GNN training, which can scale up the single-node system in a more cost-effective manner by leveraging high-performance storage devices like NVMe SSDs. We observe that the data movement between the main memory and the disk is the primary bottleneck in the SSD-based training system, and that the conventional GNN training pipeline is sub-optimal without taking this overhead into account. Thus, we propose Ginex, the first SSD-based GNN training system that can process…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
