Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs

Hao Mark Chen; Zhiwen Mo; Royson Lee; Qianzhou Wang; Da Li; Shell Xu Hu; Wayne Luk; Timothy Hospedales; Hongxiang Fan

arXiv:2602.00879·cs.LG·February 3, 2026

Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs

Hao Mark Chen, Zhiwen Mo, Royson Lee, Qianzhou Wang, Da Li, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan

PDF

Open Access

TL;DR

This paper introduces Dynamic Expert Sharing (DES), a method that reduces memory traffic and improves efficiency in Mixture-of-Experts diffusion large language models by selecting a compact set of experts for parallel decoding.

Contribution

The paper proposes DES, a novel sequence-level expert selection technique that decouples memory usage from parallelism in MoE dLLMs, with two new strategies: DES-Seq and DES-Vote.

Findings

01

Reduces expert activations by over 55%.

02

Lowers latency by up to 38%.

03

Retains 99% of vanilla model accuracy.

Abstract

Among parallel decoding paradigms, diffusion large language models (dLLMs) have emerged as a promising candidate that balances generation quality and throughput. However, their integration with Mixture-of-Experts (MoE) architectures is constrained by an expert explosion: as the number of tokens generated in parallel increases, the number of distinct experts activated grows nearly linearly. This results in substantial memory traffic that pushes inference into a memory-bound regime, negating the efficiency gains of both MoE and parallel decoding. To address this challenge, we propose Dynamic Expert Sharing (DES), a novel technique that shifts MoE optimization from token-centric pruning and conventional expert skipping methods to sequence-level coreset selection. To maximize expert reuse, DES identifies a compact, high-utility set of experts to satisfy the requirements of an entire…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Stochastic Gradient Optimization Techniques · Speech Recognition and Synthesis