AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?

Hao Li; Ruoyao Wen; Shanghao Shi; Ning Zhang; Yevgeniy Vorobeychik; Chaowei Xiao

arXiv:2602.03117·cs.CR·May 8, 2026

AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?

Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, Yevgeniy Vorobeychik, Chaowei Xiao

PDF

1 Repo

TL;DR

AgentDyn is a new benchmark with 60 dynamic, open-ended tasks designed to evaluate agent security defenses in realistic environments, revealing current defenses' limitations.

Contribution

This work introduces AgentDyn, a comprehensive benchmark with dynamic tasks and helpful instructions, addressing flaws in existing static agent security benchmarks.

Findings

01

Most existing defenses are insecure or over-conservative.

02

Current benchmarks do not reflect real-world dynamic environments.

03

AgentDyn exposes weaknesses in state-of-the-art defenses.

Abstract

AI agents that autonomously interact with external tools and environments have shown great promise across real-world applications. However, their reliance on external data exposes them to serious indirect prompt injection attacks, where malicious instructions embedded in third-party content hijack agent behaviors. To mitigate this threat, a growing number of defenses have been proposed and evaluated under existing agent security benchmarks. These benchmarks provide structured environments for comparing attacks and defenses, and have become a key driver for defense design and optimization. However, as agents move toward more complex and open-ended real-world deployments, there is a pressing need for benchmarks to become more adaptive and better reflect the dynamic environments faced by real-world agentic systems. In this work, we reveal three fundamental flaws in the current benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leolee99/AgentDyn
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.