Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang; Drew Zagieboylo; Shaona Ghosh; Sanjay Kariyappa; Kai Greshake; Hanshen Xiao; Chaowei Xiao; G. Edward Suh

arXiv:2603.30016·cs.CR·April 1, 2026

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh

PDF

TL;DR

This paper advocates for system-level defenses to protect AI agents, especially LLMs, from indirect prompt injection attacks by integrating security policies, constraints, and human oversight.

Contribution

It presents a set of system-level defense strategies, emphasizing dynamic replanning, constrained model decisions, and human interaction to enhance AI security.

Findings

01

Existing benchmarks may give a false sense of security.

02

System-level defenses can structure and control agent behaviors effectively.

03

Security should incorporate rule-based checks and human oversight.

Abstract

AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.