Rerouting LLM Routers
Avital Shafran, Roei Schuster, Thomas Ristenpart, Vitaly Shmatikov

TL;DR
This paper explores the adversarial vulnerabilities of LLM routers, showing how malicious inputs can manipulate routing decisions without degrading response quality, and discusses potential defenses.
Contribution
It introduces the concept of LLM control plane integrity, demonstrates a novel attack method using confounder gadgets, and evaluates their effectiveness and defenses.
Findings
Adversaries can manipulate LLM routing decisions using confounder gadgets.
Confounder gadgets do not impact LLM response quality.
Perplexity-based filtering is ineffective against these attacks.
Abstract
LLM routers aim to balance quality and cost of generation by classifying queries and routing them to a cheaper or more expensive LLM depending on their complexity. Routers represent one type of what we call LLM control planes: systems that orchestrate use of one or more LLMs. In this paper, we investigate routers' adversarial robustness. We first define LLM control plane integrity, i.e., robustness of LLM orchestration to adversarial inputs, as a distinct problem in AI safety. Next, we demonstrate that an adversary can generate query-independent token sequences we call ``confounder gadgets'' that, when added to any query, cause LLM routers to send the query to a strong LLM. Our quantitative evaluation shows that this attack is successful both in white-box and black-box settings against a variety of open-source and commercial routers, and that confounding queries do not affect the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Network Packet Processing and Optimization · Interconnection Networks and Systems
