From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs
Kumari Nishu, Sachin Mehta, Samira Abnar, Mehrdad Farajtabar, Maxwell, Horton, Mahyar Najibi, Moin Nabi, Minsik Cho, Devang Naik

TL;DR
DynaMoE is a post-training framework that transforms pre-trained dense LLMs into dynamic, token-difficulty-aware Mixture-of-Experts models, enabling customizable efficiency-accuracy trade-offs with minimal fine-tuning.
Contribution
It introduces a token-difficulty-driven routing mechanism for adapting pre-trained LLMs into dynamic MoE models with low fine-tuning cost.
Findings
Achieves flexible model variants with different accuracy-performance trade-offs.
Uses only 10B tokens for adaptation, significantly less than full training.
Maintains similar accuracy to baseline methods with reduced fine-tuning cost.
Abstract
Training large language models (LLMs) for different inference constraints is computationally expensive, limiting control over efficiency-accuracy trade-offs. Moreover, once trained, these models typically process tokens uniformly, regardless of their complexity, leading to static and inflexible behavior. In this paper, we introduce a post-training optimization framework, DynaMoE, that adapts a pre-trained dense LLM to a token-difficulty-driven Mixture-of-Experts model with minimal fine-tuning cost. This adaptation makes the model dynamic, with sensitivity control to customize the balance between efficiency and accuracy. DynaMoE features a token-difficulty-aware router that predicts the difficulty of tokens and directs them to the appropriate sub-networks or experts, enabling larger experts to handle more complex tokens and smaller experts to process simpler ones. Our experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
