Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Zeliang Zhang; Xiaodong Liu; Hao Cheng; Chenliang Xu; Jianfeng Gao

arXiv:2407.09590·cs.CL·June 10, 2025

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao

PDF

Open Access

TL;DR

This paper introduces a method to improve parameter efficiency in sparse Mixture-of-Experts models by pruning redundant experts, validated across multiple state-of-the-art architectures and natural language tasks.

Contribution

It proposes a novel expert grouping and pruning technique to reduce redundancy in MoE models, enhancing efficiency without sacrificing performance.

Findings

01

Pruning similar experts improves model efficiency.

02

The method outperforms existing pruning techniques.

03

Effective across multiple MoE architectures.

Abstract

By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve the model's parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures, including Mixtral, Deepseek-MoE, and Qwen. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. We will release our code to facilitate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing

MethodsMixture of Experts · Pruning