LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

Jeongmin Brian Park; Kun Wu; Vikram Sharma Mailthody; Zaid Quresh; Scott Mahlke; Wen-mei Hwu

arXiv:2407.15264·cs.DC·March 31, 2026·1 cites

LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

Jeongmin Brian Park, Kun Wu, Vikram Sharma Mailthody, Zaid Quresh, Scott Mahlke, Wen-mei Hwu

PDF

TL;DR

LSM-GNN is a storage-based multi-GPU framework that optimizes data transfer and caching to efficiently train large-scale GNNs, outperforming traditional partitioning methods.

Contribution

It introduces a novel communication layer, hybrid eviction policy, and prefetching mechanism to reduce overheads and improve performance in multi-GPU GNN training.

Findings

01

LSM-GNN achieves up to 3.75x speedup over baseline.

02

Single-node two-GPU setup outperforms multi-node configurations.

03

The framework effectively manages cache and prefetching to handle large-scale GNNs.

Abstract

Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and embeddings that often exceed the memory capacities of the target GPUs used for training. To address limited memory capacities, traditional GNN training approaches use graph partitioning and sharding techniques to scale up across multiple GPUs within a node and/or scale out across multiple nodes. However, this approach suffers from the high computational costs of graph partitioning algorithms and inefficient communication across GPUs. To address these overheads, we propose Large-scale Storage-based Multi-GPU GNN framework (LSM-GNN), a storage-based approach to train GNN models that utilizes a novel communication layer enabling GPU software caches to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.