MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
Xuanjun Zong, Zhiqi Shen, Lei Wang, Yunshi Lan, Chao Yang

TL;DR
MCP-SafetyBench is a comprehensive real-world benchmark designed to evaluate the safety of large language models operating through the Model Context Protocol across multiple domains and attack types.
Contribution
It introduces a new benchmark with real MCP servers, covering multi-turn, multi-domain safety evaluation and a taxonomy of 20 attack types, addressing gaps in existing safety assessments.
Findings
All evaluated models are vulnerable to MCP attacks.
Models exhibit a safety-utility trade-off.
Benchmark reveals the need for stronger defenses.
Abstract
Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a standardized interface for connecting LLMs with heterogeneous tools and services. Yet MCP's openness and multi-server workflows introduce new safety risks that existing benchmarks fail to capture, as they focus on isolated attacks or lack real-world coverage. We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains: browser automation, financial analysis, location navigation, repository management, and web search. It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides, and includes tasks requiring multi-step reasoning and cross-server coordination under uncertainty. Using…
Peer Reviews
Decision·ICLR 2026 Poster
See above
### Weakness The paper could better situate its contributions within broader LLM red-teaming and safety evaluation frameworks. For example, "Operationalizing a Threat Model for Red-Teaming LLMs" offers a structured way to define adversarial capabilities and goals, which could help formalize the threat assumptions behind MCP-SafetyBench. Similarly, works like Red-Teaming for Generative AI: Silver Bullet or Security Theater? and Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In provi
1. The paper introduces a comprehensive MCP safety benchmark that covers a wide range of attack types. 2. The paper proposes a clear and compact taxonomy of MCP vulnerabilities, which is crucial for safety benchmarking. 3. The paper evaluates a wide range of open-source and proprietary models 4. The paper is well-written.
1. The paper does not sufficiently justify or elaborate on the task selection process. It’s unclear why the task source (MCP-Universe) is trusted, why these five domains are focused, how specific tasks are chosen, and whether the selection introduces systematic bias. 2. The experimental settings (token/runtime/budget limits, temperature, number of turns, number of repetitions) are not clearly stated or motivated. 3. The empirical analyses lacks the rigor expected of a benchmark paper. For exa
The paper is well written and does well to condense the (now substantial) body of MCP security works, spanning previous safety benchmarks and MCP-targeted attacks. The paper also does well to situate its contributions relative to previous work. The benchmark itself is robust (particularly compared to existing benchmarks, which are substantially smaller in either scope or automated use), with the multi-turn capabilities enabling some of the more complicated and subversive MCP attacks released o
As a benchmark paper, the manuscript lacks novelty wrt nascent attacks/exploits. However, the achievement of such a comprehensive benchmark and timeliness given the ever-growing adaption of MCP-powered agents greatly outweigh this.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Software System Performance and Reliability
