TL;DR
This paper introduces R$^2$A, an adversarial attack method that misleads black-box LLM routers into selecting more expensive models, highlighting security vulnerabilities in cost-aware routing systems.
Contribution
It presents a novel black-box attack technique using surrogate models and suffix optimization to manipulate LLM routing strategies.
Findings
R$^2$A significantly increases routing to expensive models across various systems.
The attack is effective on both open-source and commercial routing systems.
The method works without white-box access or heuristic prompts.
Abstract
Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the routing strategy introduces a new security concern that adversaries may manipulate the router to consistently select expensive high-capability models. Existing routing attacks depend on either white-box access or heuristic prompts, rendering them ineffective in real-world black-box scenarios. In this work, we propose RA, which aims to mislead black-box LLM routers to expensive models via adversarial suffix optimization. Specifically, RA deploys a hybrid ensemble surrogate router to mimic the black-box router. A suffix optimization algorithm is further adapted for the ensemble-based surrogate. Extensive experiments on multiple open-source and commercial routing systems demonstrate that {RA} significantly increases the routing rate to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
