Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents
Sai Puppala, Ismail Hossain, Md Jahangir Alam, Yoonpyo Lee, Jay Yoo, Tanzim Ahad, Syed Bahauddin Alam, Sajedul Talukder

TL;DR
AgentFence introduces an architecture-centric security evaluation for deep research agents, identifying vulnerabilities across planning, memory, and tool use, and analyzing security failure rates and classes in multi-turn interactions.
Contribution
It defines 14 trust-boundary attack classes, proposes trace-auditable conversation breaks for detection, and evaluates security variation across different agent architectures.
Findings
Security break rates vary significantly across architectures.
Operational attack classes are most prevalent.
Boundary violations are the dominant cause of failures.
Abstract
Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence**, an architecture-centric security evaluation that defines 14 trust-boundary attack classes spanning planning, memory, retrieval, tool use, and delegation, and detects failures via *trace-auditable conversation breaks* (unauthorized or unsafe tool use, wrong-principal actions, state/objective integrity violations, and attack-linked deviations). Holding the base model fixed, we evaluate eight agent archetypes under persistent multi-turn interaction and observe substantial architectural variation in mean security break rate (MSBR), ranging from (LangGraph) to (AutoGPT). The highest-risk classes are operational: Denial-of-Wallet ($0.62…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Scientific Computing and Data Management
