MCPShield: Content-Aware Attack Detection for LLM Agent Tool-Call Traffic

Sultan Zavrak

arXiv:2605.11053·cs.CR·May 14, 2026

MCPShield: Content-Aware Attack Detection for LLM Agent Tool-Call Traffic

Sultan Zavrak

PDF

TL;DR

MCPShield introduces a graph-based attack detection framework for LLM agent tool-call traffic, emphasizing content features and revealing significant evaluation pitfalls.

Contribution

It presents a novel graph neural network approach for detecting attacks in MCP tool-call traffic, highlighting the importance of content embeddings and evaluation methodology.

Findings

01

Content embeddings significantly improve detection AUROC above 0.89.

02

Naive random-split evaluation inflates AUROC by up to 26 percentage points.

03

Tree ensembles on pooled embeddings achieve AUROC of 0.975, outperforming neural models.

Abstract

The Model Context Protocol (MCP) has become a widely adopted interface for LLM agents to invoke external tools, yet learned monitoring of MCP tool-call traffic remains underexplored. In this article, MCPShield is presented as an attack detection framework for MCP tool-call traffic that encodes each agent session as a graph (tool calls as nodes, sequential and data-flow links as edges), enriches nodes with sentence-embedding features over arguments and responses, and classifies sessions as benign or attacked. Three GNN architectures (GAT, GCN, GraphSAGE), a no-graph MLP, and classical baselines (XGBoost, random forest, logistic regression, linear SVM) are evaluated, with the full architecture comparison conducted on RAS-Eval (task-stratified splits) and GraphSAGE retained as the GNN baseline on ATBench and a combined-source variant (both label-stratified). Three findings emerge. First,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.