TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration
Linye Wei, Zixiang Luo, Pingzhi Tang, Meng Li

TL;DR
TEAM is a framework that accelerates MoE diffusion language models by exploiting temporal and spatial consistency in expert routing, reducing inference overhead while maintaining performance.
Contribution
The paper introduces TEAM, a novel expert activation strategy leveraging consistency properties to improve MoE diffusion model inference speed.
Findings
Achieves up to 2.2x speedup over vanilla MoE dLLMs.
Maintains negligible performance degradation.
Demonstrates effectiveness through extensive experiments.
Abstract
Diffusion large language models (dLLMs) have recently gained significant attention due to their inherent support for parallel decoding. Building on this paradigm, Mixture-of-Experts (MoE) dLLMs with autoregressive (AR) initialization have further demonstrated strong performance competitive with mainstream AR models. However, we identify a fundamental mismatch between MoE architectures and diffusion-based decoding. Specifically, a large number of experts are activated at each denoising step, while only a small subset of tokens is ultimately accepted, resulting in substantial inference overhead and limiting their deployment in latency-sensitive applications. In this work, we propose TEAM, a plug-and-play framework that accelerates MoE dLLMs by enabling more accepted tokens with fewer activated experts. TEAM is motivated by the observation that expert routing decisions exhibit strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
