Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Sai Puppala; Ismail Hossain; Md Jahangir Alam; Yoonpyo Lee; Jay Yoo; Tanzim Ahad; Syed Bahauddin Alam; Sajedul Talukder

arXiv:2602.07652·cs.CR·February 10, 2026

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Sai Puppala, Ismail Hossain, Md Jahangir Alam, Yoonpyo Lee, Jay Yoo, Tanzim Ahad, Syed Bahauddin Alam, Sajedul Talukder

PDF

Open Access

TL;DR

AgentFence introduces an architecture-centric security evaluation for deep research agents, identifying vulnerabilities across planning, memory, and tool use, and analyzing security failure rates and classes in multi-turn interactions.

Contribution

It defines 14 trust-boundary attack classes, proposes trace-auditable conversation breaks for detection, and evaluates security variation across different agent architectures.

Findings

01

Security break rates vary significantly across architectures.

02

Operational attack classes are most prevalent.

03

Boundary violations are the dominant cause of failures.

Abstract

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence**, an architecture-centric security evaluation that defines 14 trust-boundary attack classes spanning planning, memory, retrieval, tool use, and delegation, and detects failures via *trace-auditable conversation breaks* (unauthorized or unsafe tool use, wrong-principal actions, state/objective integrity violations, and attack-linked deviations). Holding the base model fixed, we evaluate eight agent archetypes under persistent multi-turn interaction and observe substantial architectural variation in mean security break rate (MSBR), ranging from $0.29 \pm 0.04$ (LangGraph) to $0.51 \pm 0.07$ (AutoGPT). The highest-risk classes are operational: Denial-of-Wallet ($0.62…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Scientific Computing and Data Management