Semantic Attacks on Tool-Augmented LLMs: Securing the Model Context Protocol Against Descriptor-Level Manipulation

Saeid Jamshidi; Arghavan Moradi Dakhel; Kawser Wazed Nafi; Foutse Khomh

arXiv:2512.06556·cs.CR·May 22, 2026

Semantic Attacks on Tool-Augmented LLMs: Securing the Model Context Protocol Against Descriptor-Level Manipulation

Saeid Jamshidi, Arghavan Moradi Dakhel, Kawser Wazed Nafi, Foutse Khomh

PDF

TL;DR

This paper identifies a new semantic attack surface in tool-augmented LLMs through descriptor manipulation and proposes a layered defense to mitigate these threats, improving system robustness.

Contribution

It formalizes descriptor-driven attack classes and introduces a comprehensive mitigation strategy without retraining models, validated across multiple LLMs and prompting strategies.

Findings

01

Descriptor manipulation can cause unsafe tool invocations in up to 36% of trials.

02

The proposed mitigation reduces unsafe invocations to 15%.

03

Cross-model analysis shows varying robustness and sensitivity to attacks.

Abstract

The Model Context Protocol (MCP) enables Large Language Models (LLMs) to interact with external tools via tool descriptors, thereby extending their capabilities for task execution, autonomous decision-making, and multi-agent coordination. Existing MCP deployments treat tool descriptors as trusted metadata, despite their direct integration into the LLM reasoning context. This introduces a previously underexplored semantic attack surface. Current defenses primarily target prompt injection, neglecting descriptor-level manipulation that can bias tool selection and downstream reasoning. To address this gap, we formalize three descriptor-driven attack classes: Tool Poisoning, Shadowing, and Rug Pull. We propose a layered defense solution that integrates descriptor integrity verification, pre-context semantic vetting with an auxiliary LLM, and lightweight runtime guardrails, without requiring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Scientific Computing and Data Management