AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks
Tanqiu Jiang, Yuhui Wang, Jiacheng Liang, Ting Wang

TL;DR
AgentLAB is a new benchmark for evaluating the vulnerability of LLM agents to complex, multi-turn, long-horizon attacks across various environments, revealing significant susceptibility and the inadequacy of existing defenses.
Contribution
This paper introduces AgentLAB, the first comprehensive benchmark for assessing LLM agent security against long-horizon attacks, including five novel attack types and 644 test cases.
Findings
LLM agents are highly vulnerable to long-horizon attacks
Existing defenses fail to mitigate multi-turn threats effectively
AgentLAB provides a standardized way to measure and improve security
Abstract
LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure agent vulnerabilities to such risks, we present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. Currently, AgentLAB supports five novel attack types including intent hijacking, tool chaining, task injection, objective drifting, and memory poisoning, spanning 28 realistic agentic environments, and 644 security test cases. Leveraging AgentLAB, we evaluate representative LLM agents and find that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Advanced Malware Detection Techniques · Adversarial Robustness in Machine Learning
