AIMER: Calibration-Free Task-Agnostic MoE Pruning

Zongfang Liu; Shengkun Tang; Yifan Shen; Huan Wang; Xin Yuan

arXiv:2603.18492·cs.LG·April 14, 2026

AIMER: Calibration-Free Task-Agnostic MoE Pruning

Zongfang Liu, Shengkun Tang, Yifan Shen, Huan Wang, Xin Yuan

PDF

TL;DR

AIMER is a calibration-free, task-agnostic expert pruning method for MoE language models that achieves competitive performance with minimal scoring time.

Contribution

It introduces a simple, calibration-free importance criterion for expert pruning that outperforms calibration-dependent methods across large-scale models.

Findings

01

AIMER achieves strong performance at 25% and 50% pruning ratios.

02

It requires only 0.22--1.27 seconds to score experts.

03

AIMER outperforms state-of-the-art calibration-based pruning methods.

Abstract

Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token compute, but the deployment still requires storing all experts, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, which makes pruning outcomes sensitive to the choice of calibration set and adds substantial preprocessing cost. We introduce AIMER (\textbf{A}bsolute mean over root mean square \textbf{IM}portance for \textbf{E}xpert \textbf{R}anking), a simple calibration-free criterion that yields clear within-layer score separation and distinct expert stratification. Across 7B to 30B MoE language models at 25\% and 50\% pruning ratios over 16 benchmarks, AIMER consistently delivers competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.