RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing

Wenhui Zhang; Huiyu Xu; Zhibo Wang; Zhichao Li; Zeqing He; Xuelin Wei; Kui Ren

arXiv:2601.21380·cs.CR·January 30, 2026

RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing

Wenhui Zhang, Huiyu Xu, Zhibo Wang, Zhichao Li, Zeqing He, Xuelin Wei, Kui Ren

PDF

Open Access

TL;DR

This paper identifies vulnerabilities in LLM routing systems to adversarial rerouting attacks, analyzes their mechanisms, and proposes RerouteGuard, a detection framework that effectively mitigates these threats with high accuracy.

Contribution

The work systematically characterizes LLM rerouting threats, evaluates real-world vulnerabilities, and introduces RerouteGuard, a scalable guardrail framework for adversarial rerouting detection.

Findings

01

Existing routing systems are vulnerable to rerouting attacks, especially for cost escalation.

02

RerouteGuard detects over 99% of rerouting attacks with minimal impact on legitimate queries.

03

Attacks exploit decision boundaries via confounder gadgets to manipulate routing decisions.

Abstract

Recent advancements in multi-model AI systems have leveraged LLM routers to reduce computational cost while maintaining response quality by assigning queries to the most appropriate model. However, as classifiers, LLM routers are vulnerable to novel adversarial attacks in the form of LLM rerouting, where adversaries prepend specially crafted triggers to user queries to manipulate routing decisions. Such attacks can lead to increased computational cost, degraded response quality, and even bypass safety guardrails, yet their security implications remain largely underexplored. In this work, we bridge this gap by systematizing LLM rerouting threats based on the adversary's objectives (i.e., cost escalation, quality hijacking, and safety bypass) and knowledge. Based on the threat taxonomy, we conduct a measurement study of real-world LLM routing systems against existing LLM rerouting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Network Security and Intrusion Detection