RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress
Ruixuan Huang, Qingyue Wang, Hantao Huang, Yudong Gao, Dong Chen, Shuai Wang, Wei Wang

TL;DR
This paper uncovers a vulnerability in Mixture-of-Experts large language models where adversarial prompts cause severe load imbalance, leading to increased latency and potential denial-of-service, and introduces RepetitionCurse to exploit this flaw.
Contribution
The paper identifies a universal routing flaw in MoE models and proposes RepetitionCurse, a black-box attack method that exploits this vulnerability across different models.
Findings
Adversarial prompts cause routing imbalance in MoE models.
RepetitionCurse increases inference latency by over 3x.
The vulnerability can be exploited in a model-agnostic manner.
Abstract
Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practice, modern inference systems commonly adopt expert parallelism to distribute experts across devices. However, the absence of explicit load balancing constraints during inference allows adversarial inputs to trigger severe routing concentration. We demonstrate that out-of-distribution prompts can manipulate the routing strategy such that all tokens are consistently routed to the same set of top- experts, which creates computational bottlenecks on certain devices while forcing others to idle. This converts an efficiency mechanism into a denial-of-service attack vector, leading to violations of service-level agreements for time to first token. We propose RepetitionCurse, a low-cost black-box strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Graph Neural Networks · Privacy-Preserving Technologies in Data
