Graph Sparsification via Mixture of Graphs
Guibin Zhang, Xiangguo Sun, Yanwei Yue, Chonghe Jiang, Kun Wang,, Tianlong Chen, Shirui Pan

TL;DR
This paper introduces Mixture-of-Graphs (MoG), a novel graph sparsification method that dynamically tailors edge pruning for each node, leading to more efficient GNNs with maintained or improved performance on large datasets.
Contribution
MoG leverages multiple sparsifier experts and a mixture approach on the Grassmann manifold to produce customized, high-sparsity subgraphs for each node, enhancing efficiency and accuracy.
Findings
Achieves higher sparsity levels (8.67% to 50.85%) with maintained or improved performance.
Provides 1.47-2.62x speedup in GNN inference with negligible performance loss.
Boosts GNN performance on large datasets, improving accuracy by up to 1.74%.
Abstract
Graph Neural Networks (GNNs) have demonstrated superior performance across various graph learning tasks but face significant computational challenges when applied to large-scale graphs. One effective approach to mitigate these challenges is graph sparsification, which involves removing non-essential edges to reduce computational overhead. However, previous graph sparsification methods often rely on a single global sparsity setting and uniform pruning criteria, failing to provide customized sparsification schemes for each node's complex local context. In this paper, we introduce Mixture-of-Graphs (MoG), leveraging the concept of Mixture-of-Experts (MoE), to dynamically select tailored pruning solutions for each node. Specifically, MoG incorporates multiple sparsifier experts, each characterized by unique sparsity levels and pruning criteria, and selects the appropriate experts for each…
Peer Reviews
Decision·ICLR 2025 Spotlight
1. The experiments show the MoG's adaptability across different graph learning tasks. They also show MoG’s ability to improve inference speed while maintaining or even boosting accuracy (up to 1.74% in some cases) demonstrates its practical benefits. 2. The paper introduces a novel method, Mixture-of-Graphs (MoG), which applies the Mixture-of-Experts (MoE) concept to graph sparsification. Unlike traditional sparsification methods with uniform criteria, MoG combines sparsity levels and pruning c
1. The paper does not clarify in “4 EXPERIMENTS” or “Appendix F.6” why the specific 12 combinations of sparsity levels and criteria were selected. Additionally, the variance of each row across combinations in Table 8 is minimal, raising questions about the distinctiveness of each sparsifier. Furthermore, in “Appendix F.6”, different sparsity criteria are applied in different datasets without an explanation of the selection rationale. To enhance experimental completeness, the authors might consid
1. The proposed method has a certain degree of innovation, especially in the use of MoE and the approach to combining graphs. 2. The proposed method can be integrated with any framework. 3. The paper includes extensive experiments.
1. Eq.11 is a core objective, but it is quite heuristic. Moreover, after obtaining the combined graph with Eq.12, there is a post-sparsification operation. Does the final ensembled sparse Laplacian truly integrate the eigenvectors of multiple sparsified ego-net Laplacians as the authors hope? Regardless of the degree achieved, it would be helpful to see experimental evidence here. 2. There are only two node classification and two graph classification datasets, which is relatively few.
1. Tackles a topical and highly relevant problem in graph learning on large graphs. 2. Highly flexible, well-motivated approach that accounts for local node variations while determining optimal pruning strategy. 3. Ample experiments across baselines and datasets, replete with sensitivity analysis, ablation studies, and efficiency comparison.
Adding below-mentioned minor clarifications around methodology may be useful: 1. How are post-sparsified ego-graphs assembled? Is it possible that for two nodes $i$ and $j$, the post-sparsified ego graph of $i$ connects to $j$ but the post-sparsified ego graph of $j$ doesn’t connect to $i$? If yes, how is this handled? 2. How do we select $p$ in equation 10? What is its impact on performance? 3. What does $D$ in eq. 13 correspond to? Is it from the original ego graph of the given node? If yes,
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Advanced Graph Neural Networks · Data Mining Algorithms and Applications
MethodsPruning
