Rethinking Efficiency and Redundancy in Training Large-scale Graphs
Xin Liu, Xunbin Xiong, Mingyu Yan, Runzhen Xue, Shirui Pan, Xiaochun, Ye, Dongrui Fan

TL;DR
This paper introduces DropReef, a novel method to identify and remove redundant information in large-scale graphs, significantly improving GNN training efficiency without sacrificing accuracy.
Contribution
It pioneers a once-for-all approach to detect and eliminate redundancy in large-scale graphs based on neighbor heterophily, enhancing training efficiency of GNNs.
Findings
DropReef reduces training time significantly.
DropReef maintains model accuracy after redundancy removal.
Compatible with state-of-the-art sampling-based GNNs.
Abstract
Large-scale graphs are ubiquitous in real-world scenarios and can be trained by Graph Neural Networks (GNNs) to generate representation for downstream tasks. Given the abundant information and complex topology of a large-scale graph, we argue that redundancy exists in such graphs and will degrade the training efficiency. Unfortunately, the model scalability severely restricts the efficiency of training large-scale graphs via vanilla GNNs. Despite recent advances in sampling-based training methods, sampling-based GNNs generally overlook the redundancy issue. It still takes intolerable time to train these models on large-scale graphs. Thereby, we propose to drop redundancy and improve efficiency of training large-scale graphs with GNNs, by rethinking the inherent characteristics in a graph. In this paper, we pioneer to propose a once-for-all method, termed DropReef, to drop the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning in Materials Science
