Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

Wenhui Zhu; Xuanzhao Dong; Xiwen Chen; Rui Cai; Peijie Qiu; Zhipeng Wang; Oana Frunza; Shao Tang; Jindong Gu; Yalin Wang

arXiv:2604.03870·cs.CL·April 7, 2026

Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

Wenhui Zhu, Xuanzhao Dong, Xiwen Chen, Rui Cai, Peijie Qiu, Zhipeng Wang, Oana Frunza, Shao Tang, Jindong Gu, Yalin Wang

PDF

TL;DR

This paper uncovers systemic vulnerabilities in multi-agent LLM systems to indirect prompt injections, demonstrating their fragility and proposing a representation engineering detection method to improve security.

Contribution

It systematically evaluates defenses against sophisticated IPI attacks in dynamic multi-step environments and introduces RepE as an effective detection strategy.

Findings

01

Advanced IPI attacks bypass most defenses

02

Agents execute malicious actions almost instantly

03

Representation engineering detects unauthorized actions effectively

Abstract

The rapid deployment of open-source frameworks has significantly advanced the development of modern multi-agent systems. However, expanded action spaces, including uncontrolled privilege exposure and hidden inter-system interactions, pose severe security challenges. Specifically, Indirect Prompt Injections (IPI), which conceal malicious instructions within third-party content, can trigger unauthorized actions such as data exfiltration during normal operations. While current security evaluations predominantly rely on isolated single-turn benchmarks, the systemic vulnerabilities of these agents within complex dynamic environments remain critically underexplored. To bridge this gap, we systematically evaluate six defense strategies against four sophisticated IPI attack vectors across nine LLM backbones. Crucially, we conduct our evaluation entirely within dynamic multi-step tool-calling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.