Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density

Zhendong Mi; Yixiao Chen; Pu Zhao; Xiaodong Yu; Hao Wang; Yanzhi Wang; Shaoyi Huang

arXiv:2602.09316·cs.LG·February 13, 2026

Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density

Zhendong Mi, Yixiao Chen, Pu Zhao, Xiaodong Yu, Hao Wang, Yanzhi Wang, Shaoyi Huang

PDF

Open Access

TL;DR

This paper introduces RFID-MoE, a novel MoE compression framework that adaptively allocates resources based on expert importance and reconstructs residuals efficiently, significantly reducing model size while maintaining performance.

Contribution

It proposes a heterogeneity-aware compression method for MoE LLMs using expert importance metrics and residual reconstruction, outperforming existing techniques.

Findings

01

Achieves a perplexity of 16.92 on PTB at 60% compression ratio.

02

Reduces perplexity by over 8.0 compared to baselines.

03

Improves zero-shot accuracy on HellaSwag by approximately 8%.

Abstract

Mixture-of-Experts (MoE) based Large Language Models (LLMs) have achieved superior performance, yet the massive memory overhead caused by storing multiple expert networks severely hinders their practical deployment. Singular Value Decomposition (SVD)-based compression has emerged as a promising post-training technique; however, most existing methods apply uniform rank allocation or rely solely on static weight properties. This overlooks the substantial heterogeneity in expert utilization observed in MoE models, where frequent routing patterns and intrinsic information density vary significantly across experts. In this work, we propose RFID-MoE, an effective framework for MoE compression by exploiting heterogeneous Routing Frequency and Information Density. We first introduce a fused metric that combines expert activation frequency with effective rank to measure expert importance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Wireless Signal Modulation Classification · Advanced Data and IoT Technologies