Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training
Jie Sun, Li Su, Zuocheng Shi, Wenting Shen, Zeke Wang, Lei Wang, Jie, Zhang, Yong Li, Wenyuan Yu, Jingren Zhou, Fei Wu

TL;DR
Legion is a novel multi-GPU system that enhances billion-scale GNN training efficiency through hierarchical graph partitioning, a unified cache, and adaptive cache management, enabling single-machine training of large graphs.
Contribution
This work introduces Legion, a system with innovative cache and partitioning strategies that significantly improve multi-GPU GNN training at billion scale.
Findings
Supports training billion-scale GNNs on a single machine
Outperforms existing cache-based systems on small graphs
Achieves higher training throughput across various datasets
Abstract
Graph neural network(GNN) has been widely applied in real-world applications, such as product recommendation in e-commerce platforms and risk control in financial management systems. Several cache-based GNN systems have been built to accelerate GNN training in a single machine with multiple GPUs. However, these systems fail to train billion-scale graphs efficiently, which is a common challenge in the industry. In this work, we propose Legion, a system that automatically pushes the envelope of multi-GPU systems for accelerating billion-scale GNN training. First, we design a hierarchical graph partitioning mechanism that significantly improves the multi-GPU cache performance. Second, we build a unified multi-GPU cache that helps to minimize the PCIe traffic incurred by caching both graph topology and features with the highest hotness. Third, we develop an automatic caching management…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Advanced Graph Neural Networks · Machine Learning and ELM
