Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills
David Noever

TL;DR
This paper uncovers a new class of vulnerabilities in MCP-based agent systems where benign tasks can be combined to produce harmful behaviors, highlighting the need for cross-domain security measures.
Contribution
It systematically analyzes compositional attack chains in MCP agents, demonstrating how coordinated actions across services can breach security boundaries and cause harm.
Findings
95 agents tested can chain legitimate operations into attacks
Current MCP architectures lack effective cross-domain security measures
Agents can achieve targeted harm through service orchestration
Abstract
This paper identifies and analyzes a novel vulnerability class in Model Context Protocol (MCP) based agent systems. The attack chain describes and demonstrates how benign, individually authorized tasks can be orchestrated to produce harmful emergent behaviors. Through systematic analysis using the MITRE ATLAS framework, we demonstrate how 95 agents tested with access to multiple services-including browser automation, financial analysis, location tracking, and code deployment-can chain legitimate operations into sophisticated attack sequences that extend beyond the security boundaries of any individual service. These red team exercises survey whether current MCP architectures lack cross-domain security measures necessary to detect or prevent a large category of compositional attacks. We present empirical evidence of specific attack chains that achieve targeted harm through service…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
