Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
Lujun Li, Zhu Qiyuan, Jiacheng Wang, Wei Li, Hao Gu, Sirui Han, Yike Guo

TL;DR
Sub-MoE introduces a novel framework for compressing Mixture-of-Experts large language models by merging experts in a shared subspace, significantly reducing parameters while maintaining high performance.
Contribution
It proposes a new Subspace Expert Merging method with adaptive clustering and shared subspace extraction, improving over existing expert merging techniques.
Findings
Maintains 96% of original performance with 25% expert reduction on Mixtral-8x7B.
Outperforms existing expert pruning and merging methods.
Effective expert compression with minimal performance loss.
Abstract
Mixture of Experts (MoE) LLMs face significant obstacles due to their massive parameter scale, which imposes memory, storage, and deployment challenges. Although recent expert merging methods promise greater efficiency by consolidating multiple experts, they are fundamentally hindered by parameter conflicts arising from expert specialization. In this paper, we present Sub-MoE, a novel MoE compression framework via Subspace Expert Merging. Our key insight is to perform joint Singular Value Decomposition (SVD) on concatenated expert weights, reducing conflicting parameters by extracting shared -matrices while enabling effective merging of the expert-specific components. Specifically, Sub-MoE consists of two innovative phases: (1) Adaptive Expert Clustering, which groups functionally coherent experts via K-means clustering based on cosine similarity of expert outputs; and (2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning · Expert finding and Q&A systems
MethodsMixture of Experts · ALIGN · k-Means Clustering · Pruning
