Machine Learning-Based Detection of MCP Attacks
Tobias Mattsson, Samuel Nyberg, Anton Borg, Ricardo Britto

TL;DR
This paper develops and evaluates machine learning models for detecting malicious MCP tools, achieving high accuracy and outperforming rule-based methods, with practical middleware implementation.
Contribution
It introduces supervised machine learning approaches for MCP attack detection, including deep learning models, and demonstrates their effectiveness over traditional rule-based solutions.
Findings
Several models achieved 100% F1-score in binary classification.
SVC and BERT models performed best in multiclass detection.
The models outperform traditional rule-based detection methods.
Abstract
The Model Context Protocol (MCP) is a new and emerging technology that extends the functionality of large language models, improving workflows but also exposing users to a new attack surface. Several studies have highlighted related security flaws, but MCP attack detection remains underexplored. To address this research gap, this study develops and evaluates a range of supervised machine learning approaches, including both traditional and deep-learning models. We evaluated the systems on the detection of malicious MCP tool descriptions in two scenarios: (1) a binary classification task distinguishing malicious from benign tools, and (2) a multiclass classification task identifying the attack type while separating benign from malicious tools. In addition to the machine learning models, we compared a rule-based approach that serves as a baseline. The results indicate that several of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
