AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunovi\'c, Luca, Beurer-Kellner, Marc Fischer, Florian Tram\`er

TL;DR
AgentDojo is an extensible evaluation environment designed to test and improve the robustness of AI agents against prompt injection attacks, using a diverse set of tasks, attack scenarios, and defenses.
Contribution
This paper introduces AgentDojo, a dynamic and extensible framework for evaluating AI agent robustness against prompt injection attacks across multiple realistic tasks.
Findings
State-of-the-art LLMs struggle with many tasks even without attacks.
Existing prompt injection attacks can compromise some security properties.
AgentDojo reveals gaps in current defenses and attack methods.
Abstract
AI agents aim to solve complex tasks by combining text-based reasoning with external tool calls. Unfortunately, AI agents are vulnerable to prompt injection attacks where data returned by external tools hijacks the agent to execute malicious tasks. To measure the adversarial robustness of AI agents, we introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data. To capture the evolving nature of attacks and defenses, AgentDojo is not a static test suite, but rather an extensible environment for designing and evaluating new agent tasks, defenses, and adaptive attacks. We populate the environment with 97 realistic tasks (e.g., managing an email client, navigating an e-banking website, or making travel bookings), 629 security test cases, and various attack and defense paradigms from the literature. We find that AgentDojo poses a challenge for both attacks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNetwork Security and Intrusion Detection · Multi-Agent Systems and Negotiation · Advanced Malware Detection Techniques
MethodsEmirates Airlines Office in Dubai
