Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems
Aaditya Pai

TL;DR
This paper uncovers a significant blind spot in injection detection for multi-agent LLM systems, where domain-camouflaged payloads evade detection, highlighting a critical vulnerability and proposing a framework to analyze it.
Contribution
The authors formalize the Camouflage Detection Gap (CDG), demonstrate its significance across multiple models and tasks, and release tools to evaluate and address this vulnerability.
Findings
Detection rates drop from 93.8% to 9.7% for camouflaged payloads on Llama 3.1 8B.
Camouflage attacks significantly increase success in multi-agent debate architectures.
Targeted detector augmentation offers limited remediation, indicating architectural vulnerability.
Abstract
Injection detectors deployed to protect LLM agents are calibrated on static, template-based payloads that announce themselves as override directives. We identify a systematic blind spot: when payloads are generated to mimic the domain vocabulary and authority structures of the target document, what we call domain camouflaged injection, standard detectors fail to flag them, with detection rates dropping from 93.8% to 9.7% on Llama 3.1 8B and from 100% to 55.6% on Gemini 2.0 Flash. We formalize this as the Camouflage Detection Gap (CDG), the difference in injection detection rate between static and camouflaged payloads. Across 45 tasks spanning three domains and two model families, CDG is large and statistically significant (chi^2 = 38.03, p < 0.001 for Llama; chi^2 = 17.05, p < 0.001 for Gemini), with zero reverse discordant pairs in either case. We additionally evaluate Llama Guard 3, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
