Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification

Jiajun Zhou; Yadong Li; Xuanze Chen; Chen Ma; Chuang Zhao; Shanqing Yu; Qi Xuan

arXiv:2604.11473·cs.LG·April 14, 2026

Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification

Jiajun Zhou, Yadong Li, Xuanze Chen, Chen Ma, Chuang Zhao, Shanqing Yu, Qi Xuan

PDF

TL;DR

D2MoE introduces a difficulty-aware, dynamic routing mechanism for Graph Neural Networks that adaptively allocates expert resources based on node difficulty, improving accuracy and efficiency.

Contribution

It proposes a novel difficulty-driven top-p routing method using predictive entropy for adaptive expert resource allocation in MoE-based GNNs.

Findings

01

Achieves up to 7.92% accuracy improvement on heterophilous graphs.

02

Reduces memory usage by up to 73.07% and training time by 46.53%.

03

Outperforms static MoE architectures across 13 benchmarks.

Abstract

Mixture-of-Experts (MoE) architectures offer a scalable path for Graph Neural Networks (GNNs) in node classification tasks but typically rely on static and rigid routing strategies that enforce a uniform expert budget or coarse-grained expert toggles on all nodes. This limitation overlooks the varying discriminative difficulty of nodes and leads to under-fitting for hard nodes and redundant computation for easy ones. To resolve this issue, we propose D2MoE, a novel framework that shifts the focus from static expert selection to node-wise expert resource allocation. By using predictive entropy as a real-time proxy for difficulty, D2MoE employs a difficulty-driven top-p routing mechanism to adaptively concentrate expert resources on hard nodes while reducing overhead for easy ones, achieving continuous and fine-grained expert budget scaling for node classification. Experiments on 13…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.