Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs
Hao Mark Chen, Zhiwen Mo, Royson Lee, Qianzhou Wang, Da Li, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan

TL;DR
This paper introduces Dynamic Expert Sharing (DES), a method that reduces memory traffic and improves efficiency in Mixture-of-Experts diffusion large language models by selecting a compact set of experts for parallel decoding.
Contribution
The paper proposes DES, a novel sequence-level expert selection technique that decouples memory usage from parallelism in MoE dLLMs, with two new strategies: DES-Seq and DES-Vote.
Findings
Reduces expert activations by over 55%.
Lowers latency by up to 38%.
Retains 99% of vanilla model accuracy.
Abstract
Among parallel decoding paradigms, diffusion large language models (dLLMs) have emerged as a promising candidate that balances generation quality and throughput. However, their integration with Mixture-of-Experts (MoE) architectures is constrained by an expert explosion: as the number of tokens generated in parallel increases, the number of distinct experts activated grows nearly linearly. This results in substantial memory traffic that pushes inference into a memory-bound regime, negating the efficiency gains of both MoE and parallel decoding. To address this challenge, we propose Dynamic Expert Sharing (DES), a novel technique that shifts MoE optimization from token-centric pruning and conventional expert skipping methods to sequence-level coreset selection. To maximize expert reuse, DES identifies a compact, high-utility set of experts to satisfy the requirements of an entire…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Stochastic Gradient Optimization Techniques · Speech Recognition and Synthesis
