MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents

Zhenhong Zhou; Yuanhe Zhang; Hongwei Cai; Moayad Aloqaily; Ouns Bouachir; Linsey Pang; Prakhar Mehrotra; Kun Wang; Qingsong Wen

arXiv:2602.14281·cs.CR·February 25, 2026

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents

Zhenhong Zhou, Yuanhe Zhang, Hongwei Cai, Moayad Aloqaily, Ouns Bouachir, Linsey Pang, Prakhar Mehrotra, Kun Wang, Qingsong Wen

PDF

Open Access

TL;DR

MCPShield introduces a security layer for LLM agents using the Model Context Protocol, enhancing trust calibration and defending against MCP-based attacks through metadata-guided validation and runtime reasoning.

Contribution

It presents MCPShield, a novel plug-in security cognition layer that improves security in MCP-based agent tool invocation by leveraging human-inspired validation and runtime analysis.

Findings

01

Successfully defends against six novel MCP-based attack scenarios

02

Maintains low false positives on benign servers

03

Imposes minimal deployment overhead

Abstract

The Model Context Protocol (MCP) standardizes tool use for LLM-based agents and enable third-party servers. This openness introduces a security misalignment: agents implicitly trust tools exposed by potentially untrusted MCP servers. However, despite its excellent utility, existing agents typically offer limited validation for third-party MCP servers. As a result, agents remain vulnerable to MCP-based attacks that exploit the misalignment between agents and servers throughout the tool invocation lifecycle. In this paper, we propose MCPShield as a plug-in security cognition layer that mitigates this misalignment and ensures agent security when invoking MCP-based tools. Drawing inspiration from human experience-driven tool validation, MCPShield assists agent forms security cognition with metadata-guided probing before invocation. Our method constrains execution within controlled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Mobile Agent-Based Network Management · Security and Verification in Computing