Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, Yongfeng Zhang

TL;DR
This paper introduces Agent Security Bench (ASB), a comprehensive framework for formalizing, benchmarking, and evaluating security attacks and defenses on LLM-based agents across various scenarios and metrics.
Contribution
The paper presents a novel benchmarking framework, ASB, that systematically evaluates attacks and defenses on LLM-based agents in diverse real-world scenarios.
Findings
High attack success rate of 84.30% across scenarios.
Current defenses show limited effectiveness.
Critical vulnerabilities identified in agent operation stages.
Abstract
Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents, including 10 scenarios (e.g., e-commerce, autonomous driving, finance), 10 agents targeting the scenarios, over 400 tools, 27 different types of attack/defense methods, and 7 evaluation metrics. Based on ASB, we benchmark 10 prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought backdoor attack, 4 mixed attacks, and 11 corresponding defenses across 13 LLM backbones. Our…
Peer Reviews
Decision·ICLR 2025 Poster
- The ASB framework is comprehensive, covering diverse attack types (e.g., prompt injection, memory poisoning, backdoor) and multiple defense strategies. ASB includes ten distinct agent scenarios, each with tools and tasks tailored for real-world applications. - It provides structured metrics, such as Attack Success Rate (ASR) and Refuse Rate (RR), to evaluate the effectiveness of both attacks and defenses. - The results of Figure 2 is interesting. Larger Models Tend to be More Fragile. - Extens
The article lacks a critical analysis of the new findings and fails to compare the results with those of previous similar studies. For instance, in the text"Larger Models Tend to be More Fragile," the author states, "We visualize the correlation between backbone LLM leaderboard quality (Analysis, 2024) and average ASR across various attacks in Fig. 2. Larger models usually have higher ASR because their stronger capabilities make them more likely to follow attacker instructions." If the conclusio
This paper examines an important and timely area investigating security vulnerabilities in LLM systems. The authors evaluate and analyze a comprehensive set of both attack vectors and relative defenses, providing extensive testing coverage. Additionally, the paper makes a valuable contribution by unifying different types of prompt injection and poisoning attacks under a single cohesive framework for analysis (including an attack which leverages the different attack vectors, which this works show
- Why do the authors introduce the concept of "Observation Prompt Injections" when there is already the term "Indirect Prompt Injection" which is commonly used in the literature? -The motivation of PoT Attacks is unclear: what's an example of an adversary who controls the system prompt but not API? If the adversary is the API provider, there is nothing the user can do as the adversary can just make the API output arbitrary actions based on the inputs. - The evaluation is not necessarily the most
+ The paper presents a comprehensive evaluation framework covering a range of attacks, metrics, models, and scenarios. + It systemizes various attacks at different stages of LLM-based agents. + The evaluation considers nearly 90,000 test cases, quantifying the vulnerabilities of existing LLM-based agents.
+ The new insights provided by the evaluation are limited. The main conclusion that LLM-based agents are vulnerable to various malicious manipulations at different stages is not new. It has been validated in previous studies. I expect to see that developing a comprehensive evaluation platform can provide new insights, which are impossible otherwise. For example, through a comparative study of different attacks/defenses, it may highlight their strengths/limitations and outline their design spectr
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Access Control and Trust · Multi-Agent Systems and Negotiation
