ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems
Wenyong Zhou, Yuannuo Feng, Yizhe Chen, Taiqiang Wu, Wendong Xu, Wenbo Qi, Zhengwu Liu, Wang Kang, Ngai Wong

TL;DR
This paper introduces ROMER, a calibration framework for MoE LLMs on analog CIM hardware, addressing noise-induced load imbalance and routing issues to improve model performance.
Contribution
ROMER is the first systematic calibration method that compensates for hardware noise in MoE LLMs on analog CIM systems, enhancing robustness and efficiency.
Findings
ROMER reduces perplexity by up to 59.8% under real-chip noise.
Hardware noise disrupts expert load balance and routing decisions.
ROMER improves model robustness across multiple MoE architectures.
Abstract
Large language models (LLMs) with mixture-of-experts (MoE) architectures achieve remarkable scalability by sparsely activating a subset of experts per token, yet their frequent expert switching creates memory bandwidth bottlenecks that compute-in-memory (CIM) architectures are well-suited to mitigate. However, analog CIM systems suffer from inherent hardware imperfections that perturb stored weights, and its negative impact on MoE-based LLMs in noisy CIM environments remains unexplored. In this work, we present the first systematic investigation of MoE-based LLMs under noise model calibrated with real chip measurements, revealing that hardware noise critically disrupts expert load balance and renders clean-trained routing decisions consistently suboptimal. Based on these findings, we propose ROMER, a post-training calibration framework that (1) replaces underactivated experts with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
