Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

Shuibai Zhang; Caspian Zhuang; Chihan Cui; Zhihan Yang; Fred Zhangzhi Peng; Yanxin Zhang; Haoyue Bai; Zack Jia; Yang Zhou; Guanhua Chen; Ming Liu

arXiv:2604.01622·cs.LG·April 3, 2026

Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

Shuibai Zhang, Caspian Zhuang, Chihan Cui, Zhihan Yang, Fred Zhangzhi Peng, Yanxin Zhang, Haoyue Bai, Zack Jia, Yang Zhou, Guanhua Chen, Ming Liu

PDF

1 Repo

TL;DR

This paper introduces expert-choice routing for diffusion language models, improving load balancing, throughput, and convergence by adaptively allocating compute based on denoising step efficiency.

Contribution

It demonstrates that expert-choice routing outperforms token-choice routing in diffusion language models and enables retrofitting existing models for better performance.

Findings

01

EC routing provides deterministic load balancing and higher throughput.

02

Allocating more capacity to low-mask-ratio steps improves performance.

03

Retrofitting TC models to EC yields faster convergence and better accuracy.

Abstract

Diffusion language models (DLMs) enable parallel, non-autoregressive text generation, yet existing DLM mixture-of-experts (MoE) models inherit token-choice (TC) routing from autoregressive systems, leading to load imbalance and rigid computation allocation. We show that expert-choice (EC) routing is a better fit for DLMs: it provides deterministic load balancing by design, yielding higher throughput and faster convergence than TC. Building on the property that EC capacity is externally controllable, we introduce timestep-dependent expert capacity, which varies expert allocation according to the denoising step. We find that allocating more capacity to low-mask-ratio steps consistently achieves the best performance under matched FLOPs, and provide a mechanistic explanation: tokens in low-mask-ratio contexts exhibit an order-of-magnitude higher learning efficiency, so concentrating compute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangshuibai/EC-DLM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.