AIMER: Calibration-Free Task-Agnostic MoE Pruning
Zongfang Liu, Shengkun Tang, Yifan Shen, Huan Wang, Xin Yuan

TL;DR
AIMER is a calibration-free, task-agnostic expert pruning method for MoE language models that achieves competitive performance with minimal scoring time.
Contribution
It introduces a simple, calibration-free importance criterion for expert pruning that outperforms calibration-dependent methods across large-scale models.
Findings
AIMER achieves strong performance at 25% and 50% pruning ratios.
It requires only 0.22--1.27 seconds to score experts.
AIMER outperforms state-of-the-art calibration-based pruning methods.
Abstract
Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token compute, but the deployment still requires storing all experts, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, which makes pruning outcomes sensitive to the choice of calibration set and adds substantial preprocessing cost. We introduce AIMER (\textbf{A}bsolute mean over root mean square \textbf{IM}portance for \textbf{E}xpert \textbf{R}anking), a simple calibration-free criterion that yields clear within-layer score separation and distinct expert stratification. Across 7B to 30B MoE language models at 25\% and 50\% pruning ratios over 16 benchmarks, AIMER consistently delivers competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
