Design Patterns for Securing LLM Agents against Prompt Injections

Luca Beurer-Kellner; Beat Buesser; Ana-Maria Cre\c{t}u; Edoardo Debenedetti; Daniel Dobos; Daniel Fabian; Marc Fischer; David Froelicher; Kathrin Grosse; Daniel Naeff; Ezinwanne Ozoani; Andrew Paverd; Florian Tram\`er; V\'aclav Volhejn

arXiv:2506.08837·cs.LG·June 30, 2025·2 cites

Design Patterns for Securing LLM Agents against Prompt Injections

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre\c{t}u, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tram\`er, V\'aclav Volhejn

PDF

Open Access

TL;DR

This paper introduces design patterns to enhance the security of Large Language Model (LLM) agents against prompt injection attacks, balancing utility and security through systematic analysis and real-world case studies.

Contribution

It proposes a set of principled design patterns that provide provable resistance to prompt injections in LLM agents, addressing a critical security challenge.

Findings

01

Patterns offer provable resistance to prompt injections

02

Trade-offs between utility and security are analyzed

03

Case studies demonstrate real-world applicability

Abstract

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling

MethodsSparse Evolutionary Training