LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
Jiawei Hao, Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Dan Zeng

TL;DR
LightMoE introduces an expert replacing paradigm that reduces memory and training costs in MoE-based LLMs, maintaining high performance through adaptive expert selection and hierarchical expert construction.
Contribution
The paper proposes LightMoE, a novel framework that enhances expert compression via expert replacing, achieving high performance at reduced memory and training costs.
Findings
Matches LoRA fine-tuning performance at 30% compression
Outperforms existing methods at 50% compression
Achieves 5.6% average performance improvement across tasks
Abstract
Mixture-of-Experts (MoE) based Large Language Models (LLMs) have demonstrated impressive performance and computational efficiency. However, their deployment is often constrained by substantial memory demands, primarily due to the need to load numerous expert modules. While existing expert compression techniques like pruning or merging attempt to mitigate this, they often suffer from irreversible knowledge loss or high training overhead. In this paper, we propose a novel expert compression paradigm termed expert replacing, which replaces redundant experts with parameter-efficient modules and recovers their capabilities with low training costs. We find that even a straightforward baseline of this paradigm yields promising performance. Building on this foundation, we introduce LightMoE, a framework that enhances the paradigm by introducing adaptive expert selection, hierarchical expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing · Advanced Neural Network Applications
