EvoESAP: Non-Uniform Expert Pruning for Sparse MoE

Zongfang Liu; Shengkun Tang; Boyang Sun; Zhiqiang Shen; Xin Yuan

arXiv:2603.06003·cs.LG·April 14, 2026

EvoESAP: Non-Uniform Expert Pruning for Sparse MoE

Zongfang Liu, Shengkun Tang, Boyang Sun, Zhiqiang Shen, Xin Yuan

PDF

1 Repo

TL;DR

EvoESAP introduces a non-uniform expert pruning method for sparse Mixture-of-Experts models, optimizing layer-wise sparsity allocation to improve language model performance while reducing memory and compute costs.

Contribution

The paper proposes EvoESAP, an evolutionary search framework that optimizes non-uniform sparsity allocation across layers using a stable, cost-effective metric, outperforming uniform pruning.

Findings

01

EvoESAP improves open-ended generation performance by up to 19.6% at 50% sparsity.

02

EvoESAP consistently outperforms uniform pruning across models from 7B to 30B parameters.

03

The method maintains competitive accuracy on multiple-choice tasks despite aggressive sparsity.

Abstract

Sparse Mixture-of-Experts (SMoE) language models achieve strong capability at low per-token compute, yet deployment remains constrained by memory footprint and throughput because the full expert pool must still be stored and served. Post-training expert pruning reduces this cost, but most methods focus on which experts to prune within each layer and default to a uniform layer-wise sparsity allocation, even though the layer-wise allocation can strongly affect performance. We decouple pruning into within-layer expert ranking and across-layer budget allocation, and introduce \textbf{E}xpected \textbf{S}peculative \textbf{A}cceptance \textbf{P}roxy (\textbf{ESAP}), a speculative-decoding-inspired, teacher-forced metric that measures how well a pruned model matches the full model without costly autoregressive decoding. ESAP is bounded and stable, enabling cheap comparison of many candidates.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZongfangLiu/EvoESAP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.