Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging

Lujun Li; Zhu Qiyuan; Jiacheng Wang; Wei Li; Hao Gu; Sirui Han; Yike Guo

arXiv:2506.23266·cs.LG·July 1, 2025

Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging

Lujun Li, Zhu Qiyuan, Jiacheng Wang, Wei Li, Hao Gu, Sirui Han, Yike Guo

PDF

Open Access 1 Video

TL;DR

Sub-MoE introduces a novel framework for compressing Mixture-of-Experts large language models by merging experts in a shared subspace, significantly reducing parameters while maintaining high performance.

Contribution

It proposes a new Subspace Expert Merging method with adaptive clustering and shared subspace extraction, improving over existing expert merging techniques.

Findings

01

Maintains 96% of original performance with 25% expert reduction on Mixtral-8x7B.

02

Outperforms existing expert pruning and merging methods.

03

Effective expert compression with minimal performance loss.

Abstract

Mixture of Experts (MoE) LLMs face significant obstacles due to their massive parameter scale, which imposes memory, storage, and deployment challenges. Although recent expert merging methods promise greater efficiency by consolidating multiple experts, they are fundamentally hindered by parameter conflicts arising from expert specialization. In this paper, we present Sub-MoE, a novel MoE compression framework via Subspace Expert Merging. Our key insight is to perform joint Singular Value Decomposition (SVD) on concatenated expert weights, reducing conflicting parameters by extracting shared $U$ -matrices while enabling effective merging of the expert-specific $V$ components. Specifically, Sub-MoE consists of two innovative phases: (1) Adaptive Expert Clustering, which groups functionally coherent experts via K-means clustering based on cosine similarity of expert outputs; and (2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging· underline

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning · Expert finding and Q&A systems

MethodsMixture of Experts · ALIGN · k-Means Clustering · Pruning