Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification
Jiajun Zhou, Yadong Li, Xuanze Chen, Chen Ma, Chuang Zhao, Shanqing Yu, Qi Xuan

TL;DR
D2MoE introduces a difficulty-aware, dynamic routing mechanism for Graph Neural Networks that adaptively allocates expert resources based on node difficulty, improving accuracy and efficiency.
Contribution
It proposes a novel difficulty-driven top-p routing method using predictive entropy for adaptive expert resource allocation in MoE-based GNNs.
Findings
Achieves up to 7.92% accuracy improvement on heterophilous graphs.
Reduces memory usage by up to 73.07% and training time by 46.53%.
Outperforms static MoE architectures across 13 benchmarks.
Abstract
Mixture-of-Experts (MoE) architectures offer a scalable path for Graph Neural Networks (GNNs) in node classification tasks but typically rely on static and rigid routing strategies that enforce a uniform expert budget or coarse-grained expert toggles on all nodes. This limitation overlooks the varying discriminative difficulty of nodes and leads to under-fitting for hard nodes and redundant computation for easy ones. To resolve this issue, we propose D2MoE, a novel framework that shifts the focus from static expert selection to node-wise expert resource allocation. By using predictive entropy as a real-time proxy for difficulty, D2MoE employs a difficulty-driven top-p routing mechanism to adaptively concentrate expert resources on hard nodes while reducing overhead for easy ones, achieving continuous and fine-grained expert budget scaling for node classification. Experiments on 13…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
