TL;DR
This paper uncovers a structural vulnerability in multi-agent LLM systems where conjunctive prompt attacks can activate harmful behaviors through routing manipulations, bypassing existing defenses.
Contribution
It introduces conjunctive prompt attacks exploiting routing in multi-agent LLMs and demonstrates their effectiveness across various topologies, highlighting the need for new defenses.
Findings
Routing-aware optimization increases attack success significantly.
Existing defenses like PromptGuard and Llama-Guard are ineffective against these attacks.
The attack exposes a fundamental vulnerability in agentic LLM pipelines.
Abstract
Most LLM safety work studies single-agent models, but many real applications rely on multiple interacting agents. In these systems, prompt segmentation and inter-agent routing create attack surfaces that single-agent evaluations miss. We study \emph{conjunctive prompt attacks}, where a trigger key in the user query and a hidden adversarial template in one compromised remote agent each appear benign alone but activate harmful behavior when routing brings them together. We consider an attacker who changes neither model weights nor the client agent and instead controls only trigger placement and template insertion. Across star, chain, and DAG topologies, routing-aware optimization substantially increases attack success over non-optimized baselines while keeping false activations low. Existing defenses, including PromptGuard, Llama-Guard variants, and system-level controls such as tool…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
