EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models

Yuanteng Chen; Yuantian Shao; Peisong Wang; Jian Cheng

arXiv:2508.01625·cs.LG·August 5, 2025

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models

Yuanteng Chen, Yuantian Shao, Peisong Wang, Jian Cheng

PDF

Open Access 1 Video

TL;DR

EAC-MoE introduces a novel approach for large language models that reduces memory usage and accelerates inference by calibrating expert selection and pruning less-used experts, addressing key challenges in MoE systems.

Contribution

The paper presents EAC-MoE, a new expert-selection aware compressor that combines quantization calibration and expert pruning to enhance MoE-LLMs efficiency.

Findings

01

Reduces GPU memory consumption significantly.

02

Improves inference speed with minimal performance loss.

03

Effectively calibrates expert selection bias in MoE models.

Abstract

Mixture-of-Experts (MoE) has demonstrated promising potential in scaling LLMs. However, it is hindered by two critical challenges: (1) substantial GPU memory consumption to load all experts; (2) low activated parameters cannot be equivalently translated into inference acceleration effects. In this work, we propose EAC-MoE, an Expert-Selection Aware Compressor for MoE-LLMs, which deeply aligns with the characteristics of MoE from the perspectives of quantization and pruning, and introduces two modules to address these two challenges respectively: (1) The expert selection bias caused by low-bit quantization is a major factor contributing to the performance degradation in MoE-LLMs. Based on this, we propose Quantization with Expert-Selection Calibration (QESC), which mitigates the expert selection bias by calibrating the routers within the MoE; (2) There are always certain experts that are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing · Advanced Neural Network Applications