Reducing Memory Contention and I/O Congestion for Disk-based GNN Training
Qisheng Jiang, Lei Jia, Chundong Wang

TL;DR
GNNDrive is a novel system that reduces memory contention and I/O congestion in disk-based GNN training, significantly improving training speed on large graphs by optimizing buffer management and asynchronous data loading.
Contribution
It introduces GNNDrive, which minimizes memory use and I/O bottlenecks, enabling faster training of GNNs on large-scale graphs compared to existing systems.
Findings
GNNDrive outperforms state-of-the-art systems by up to 16.9x in training speed.
It effectively reduces memory contention and I/O congestion.
Experimental results demonstrate significant performance improvements.
Abstract
Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial on an ordinary machine. Given a gigantic graph, even sample-based GNN training cannot work efficiently, since it is difficult to keep the graph's entire data in memory during the training process. Leveraging a solid-state drive (SSD) or other storage devices to extend the memory space has been studied in training GNNs. Memory and I/Os are hence critical for effectual disk-based training. We find that state-of-the-art (SoTA) disk-based GNN training systems severely suffer from issues like the memory contention between a graph's topological and feature data, and severe I/O congestion upon loading data from SSD for training. We accordingly develop GNNDrive. GNNDrive 1) minimizes the memory footprint with holistic buffer management across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Parallel Computing and Optimization Techniques · Cloud Computing and Remote Desktop Technologies
Methods1x1 Convolution · Convolution · Non Maximum Suppression · GraphSAGE · SSD
