DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs

Minxuan Lv; Zhenpeng Su; Leiyu Pan; Yizhe Xiong; Zijia Lin; Hui Chen; Wei Zhou; Jungong Han; Guiguang Ding; Cheng Luo; Di Zhang; Kun Gai; Songlin Hu

arXiv:2502.12455·cs.CL·September 16, 2025

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs

Minxuan Lv, Zhenpeng Su, Leiyu Pan, Yizhe Xiong, Zijia Lin, Hui Chen, Wei Zhou, Jungong Han, Guiguang Ding, Cheng Luo, Di Zhang, Kun Gai, Songlin Hu

PDF

Open Access 1 Video

TL;DR

DSMoe introduces a dynamic, partitioned expert routing method for dense LLMs that enhances efficiency and performance by adaptively allocating computational resources based on input complexity.

Contribution

It presents a novel matrix-partitioned expert architecture with dynamic routing and a sparsity loss, improving efficiency without sacrificing model knowledge.

Findings

01

Outperforms existing pruning and MoE methods under similar computational budgets.

02

Learns distinctive layerwise activation patterns that inform future MoE designs.

03

Excels particularly in language generation tasks.

Abstract

As large language models continue to scale, computational costs and resource consumption have emerged as significant challenges. While existing sparsification methods like pruning reduce computational overhead, they risk losing model knowledge through parameter removal. This paper proposes DSMoE (Dynamic Sparse Mixture-of-Experts), a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks. We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge based on input complexity. Additionally, we introduce a sparsity loss term to balance performance and computational efficiency. Extensive experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs· underline

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Data Mining Algorithms and Applications

MethodsMixture of Experts · LLaMA · Pruning · Sigmoid Activation