PreMoE: Proactive Inference for Efficient Mixture-of-Experts

Zehua Pei; Ying Zhang; Hui-Ling Zhen; Tao Yuan; Xianzhi Yu; Zhenhua Dong; Sinno Jialin Pan; Mingxuan Yuan; Bei Yu

arXiv:2505.17639·cs.LG·April 27, 2026

PreMoE: Proactive Inference for Efficient Mixture-of-Experts

Zehua Pei, Ying Zhang, Hui-Ling Zhen, Tao Yuan, Xianzhi Yu, Zhenhua Dong, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

PDF

1 Repo

TL;DR

PreMoE is a training-free framework that efficiently compiles sparse Mixture-of-Experts models for specific deployment scenarios using a novel expert utility metric.

Contribution

It introduces Predicted Expert Utility (PEU) for domain-aware expert ranking without retraining, enabling targeted specialization or generalization.

Findings

01

PreMoE achieves up to 50% sparsity with minimal performance loss.

02

PEU provides stable expert importance estimation under high sparsity.

03

Domain-specific specialists and multi-domain generalists can be efficiently compiled.

Abstract

Mixture-of-Experts (MoE) models offer dynamic computation, but are typically deployed as static full-capacity models, missing opportunities for deployment-specific specialization. We introduce PreMoE, a training-free framework that proactively compiles sparse MoE variants for targeted deployment scenarios. At its core is Predicted Expert Utility (PEU), a robust metric for estimating expert importance from router logits through high-confidence threshold filtering and logit transformation, which together stabilize utility estimation under aggressive sparsity. Using PEU scores computed on a small calibration set, PreMoE produces domain-aware expert rankings that can be used to compile either domain-specific specialists or high-efficiency multi-domain generalists, without any retraining. Across MoE models ranging from 30B to 718B parameters, PreMoE achieves up to 50\% sparsity with nearly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jarvispei/PreMoe
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.