From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Yiheng Huang; Zhijia Zhao; Bihuan Chen; Susheng Wu; Zhuotong Zhou; Yiheng Cao; Xin Hu; Xin Peng

arXiv:2604.01905·cs.CR·May 20, 2026

From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Yiheng Huang, Zhijia Zhao, Bihuan Chen, Susheng Wu, Zhuotong Zhou, Yiheng Cao, Xin Hu, Xin Peng

PDF

TL;DR

This paper introduces a component-centric approach to understanding and detecting malicious MCP servers in LLM systems, including a new dataset and a behavioral deviation detector called Connor.

Contribution

It presents the first component-centric PoC dataset of malicious MCP servers and a novel two-stage detection method, Connor, for identifying malicious behaviors.

Findings

01

Component position influences attack success rate.

02

Multi-component attacks are more effective than single-component attacks.

03

Connor achieves 94.6% F1-score, outperforming existing methods.

Abstract

The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new attack vectors. Despite the growing adoption of MCP, existing MCP security studies classify attacks by their observable effects, obscuring how attacks behave across different MCP server components and overlooking multi-component attack chains. Meanwhile, existing defenses are less effective when facing multi-component attacks or previously unknown malicious behaviors. This work presents a component-centric perspective for understanding and detecting malicious MCP servers. First, we build the first component-centric PoC dataset of 114 malicious MCP servers where attacks are achieved as manipulation over MCP components and their compositions. We evaluate these attacks' effectiveness across two MCP hosts and five LLMs, and uncover that (1)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.