ATLAS: Efficient Out-of-Core Inference for Billion-Scale Graph Neural Networks
Pranjal Naman, Yogesh Simmhan

TL;DR
ATLAS is a disk-based GNN inference framework that enables efficient full-graph, layer-wise inference on billion-scale graphs exceeding memory capacity, achieving 12-30x speedup over state-of-the-art methods.
Contribution
ATLAS introduces a broadcast-based execution model and a tiered memory-disk hierarchy to significantly improve out-of-core GNN inference efficiency.
Findings
Achieves 12-30x faster inference than existing out-of-core baselines.
Supports graphs with up to 4 billion edges and 550 GiB features.
Maintains high throughput with only 128 GiB RAM and 2 TiB SSD.
Abstract
Graph Neural Network (GNN) inference on billion-scale graphs is critical for domains like fintech and recommendation systems. Full-graph inference on these large graphs can be challenging due to high communication costs in distributed settings and high I/O costs in disk-backed Out-of-Core (OOC) settings. Existing OOC systems, operating across disk and memory, primarily focus on GNN training and perform poorly for full-graph inference due to massive read amplification, irregular I/O, and memory pressure. We present ATLAS, a disk-based GNN inference framework that enables efficient full-graph, layer-wise inference on graphs whose topologies, features and intermediate embeddings exceed the available memory on single machines. ATLAS replaces gather-based execution with a broadcast-based model that enables sequential, single-pass streaming reads of features and embeddings per layer. A tiered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
