MCPShield: Content-Aware Attack Detection for LLM Agent Tool-Call Traffic
Sultan Zavrak

TL;DR
MCPShield introduces a graph-based attack detection framework for LLM agent tool-call traffic, emphasizing content features and revealing significant evaluation pitfalls.
Contribution
It presents a novel graph neural network approach for detecting attacks in MCP tool-call traffic, highlighting the importance of content embeddings and evaluation methodology.
Findings
Content embeddings significantly improve detection AUROC above 0.89.
Naive random-split evaluation inflates AUROC by up to 26 percentage points.
Tree ensembles on pooled embeddings achieve AUROC of 0.975, outperforming neural models.
Abstract
The Model Context Protocol (MCP) has become a widely adopted interface for LLM agents to invoke external tools, yet learned monitoring of MCP tool-call traffic remains underexplored. In this article, MCPShield is presented as an attack detection framework for MCP tool-call traffic that encodes each agent session as a graph (tool calls as nodes, sequential and data-flow links as edges), enriches nodes with sentence-embedding features over arguments and responses, and classifies sessions as benign or attacked. Three GNN architectures (GAT, GCN, GraphSAGE), a no-graph MLP, and classical baselines (XGBoost, random forest, logistic regression, linear SVM) are evaluated, with the full architecture comparison conducted on RAS-Eval (task-stratified splits) and GraphSAGE retained as the GNN baseline on ATBench and a combined-source variant (both label-stratified). Three findings emerge. First,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
