Design Patterns for Securing LLM Agents against Prompt Injections
Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre\c{t}u, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tram\`er, V\'aclav Volhejn

TL;DR
This paper introduces design patterns to enhance the security of Large Language Model (LLM) agents against prompt injection attacks, balancing utility and security through systematic analysis and real-world case studies.
Contribution
It proposes a set of principled design patterns that provide provable resistance to prompt injections in LLM agents, addressing a critical security challenge.
Findings
Patterns offer provable resistance to prompt injections
Trade-offs between utility and security are analyzed
Case studies demonstrate real-world applicability
Abstract
As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
MethodsSparse Evolutionary Training
