Graph Knowledge Distillation to Mixture of Experts
Pavel Rumiantsev, Mark Coates

TL;DR
This paper introduces Routing-by-Memory, a Mixture-of-Experts model designed for graph knowledge distillation, achieving more consistent node classification performance compared to traditional MLPs.
Contribution
The paper proposes a novel MoE-based student model, Routing-by-Memory, that improves the consistency of knowledge distillation from GNNs across various datasets.
Findings
Routing-by-Memory outperforms MLP-based distillation in accuracy.
Experts in RbM specialize on different regions of the representation space.
The approach achieves more stable performance across multiple datasets.
Abstract
In terms of accuracy, Graph Neural Networks (GNNs) are the best architectural choice for the node classification task. Their drawback in real-world deployment is the latency that emerges from the neighbourhood processing operation. One solution to the latency issue is to perform knowledge distillation from a trained GNN to a Multi-Layer Perceptron (MLP), where the MLP processes only the features of the node being classified (and possibly some pre-computed structural information). However, the performance of such MLPs in both transductive and inductive settings remains inconsistent for existing knowledge distillation techniques. We propose to address the performance concerns by using a specially-designed student model instead of an MLP. Our model, named Routing-by-Memory (RbM), is a form of Mixture-of-Experts (MoE), with a design that enforces expert specialization. By encouraging each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Advanced Clustering Algorithms Research · Data Mining Algorithms and Applications
MethodsKnowledge Distillation
