EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization
Zhongqian Fu, Tianyi Zhao, Ning Ding, Xianzhi Yu, Xiaosong Li, Yehui Tang, Yunhe Wang

TL;DR
EAQuant is a novel post-training quantization framework specifically designed for Mixture-of-Experts models, addressing activation outliers, routing stability, and sparse expert calibration to enable high-precision compression under ultra-low-bit settings.
Contribution
The paper introduces three expert-aware techniques for MoE quantization, significantly improving accuracy and robustness over existing methods in ultra-low-bit scenarios.
Findings
Achieves up to 13.81% accuracy improvement over existing methods.
Demonstrates robustness under aggressive quantization settings like W4A4 and W2A4.
Establishes new state-of-the-art in MoE model compression.
Abstract
Mixture-of-Experts (MoE) models enable scalable computation and performance in large-scale deep learning but face quantization challenges due to sparse expert activation and dynamic routing. Existing post-training quantization (PTQ) methods fail to address activation outliers, routing instability, and sparse expert calibration, leading to significant performance degradation. To address this, we propose EAQuant, a PTQ framework tailored for MoE architectures. Our method introduces three expert-aware innovations: (1) smoothing aggregation to suppress activation outliers, (2) routing consistency alignment to preserve expert selection post-quantization, and (3) calibration data balance to optimize sparsely activated experts. These strategies collectively enable robust, high-precision quantization of MoE models under ultra-low-bit constraints.Extensive experiments across several extreme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Context-Aware Activity Recognition Systems
MethodsMixture of Experts
