Automating Agent Hijacking via Structural Template Injection
Xinhao Deng, Jiaqing Wu, Miao Chen, Yue Xiao, Ke Xu, Qi Li

TL;DR
This paper introduces Phantom, an automated framework that exploits structural template injection to hijack LLM agents, revealing significant vulnerabilities in commercial systems and outperforming existing attack methods.
Contribution
Phantom is the first automated attack framework leveraging structured template injection and search techniques to effectively hijack LLM agents, improving success rates and transferability.
Findings
Achieved high attack success rates on multiple LLMs.
Identified over 70 vulnerabilities in commercial products.
Outperformed baseline attacks in efficiency and effectiveness.
Abstract
Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Spam and Phishing Detection
