EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization

Zhongqian Fu; Tianyi Zhao; Ning Ding; Xianzhi Yu; Xiaosong Li; Yehui Tang; Yunhe Wang

arXiv:2506.13329·cs.CL·February 3, 2026

EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization

Zhongqian Fu, Tianyi Zhao, Ning Ding, Xianzhi Yu, Xiaosong Li, Yehui Tang, Yunhe Wang

PDF

Open Access

TL;DR

EAQuant is a novel post-training quantization framework specifically designed for Mixture-of-Experts models, addressing activation outliers, routing stability, and sparse expert calibration to enable high-precision compression under ultra-low-bit settings.

Contribution

The paper introduces three expert-aware techniques for MoE quantization, significantly improving accuracy and robustness over existing methods in ultra-low-bit scenarios.

Findings

01

Achieves up to 13.81% accuracy improvement over existing methods.

02

Demonstrates robustness under aggressive quantization settings like W4A4 and W2A4.

03

Establishes new state-of-the-art in MoE model compression.

Abstract

Mixture-of-Experts (MoE) models enable scalable computation and performance in large-scale deep learning but face quantization challenges due to sparse expert activation and dynamic routing. Existing post-training quantization (PTQ) methods fail to address activation outliers, routing instability, and sparse expert calibration, leading to significant performance degradation. To address this, we propose EAQuant, a PTQ framework tailored for MoE architectures. Our method introduces three expert-aware innovations: (1) smoothing aggregation to suppress activation outliers, (2) routing consistency alignment to preserve expert selection post-quantization, and (3) calibration data balance to optimize sparsely activated experts. These strategies collectively enable robust, high-precision quantization of MoE models under ultra-low-bit constraints.Extensive experiments across several extreme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Context-Aware Activity Recognition Systems

MethodsMixture of Experts