You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Ching-Yu Kao; Xinfeng Li; Shenyu Dai; Tianze Qiu; Pengcheng Zhou; Eric Hanchen Jiang; Philip Sperl

arXiv:2603.11862·cs.CR·March 13, 2026

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Ching-Yu Kao, Xinfeng Li, Shenyu Dai, Tianze Qiu, Pengcheng Zhou, Eric Hanchen Jiang, Philip Sperl

PDF

Open Access

TL;DR

This paper exposes a fundamental security vulnerability in high-privilege LLM agents caused by their inability to distinguish malicious instructions embedded in documentation, leading to high success rates of data exfiltration and highlighting a significant safety gap.

Contribution

The paper formalizes the Trusted Executor Dilemma, introduces ReadSecBench for measuring instruction-induced data leakage, and demonstrates the vulnerability's severity across models and defenses.

Findings

01

End-to-end exfiltration success rates up to 85%

02

Semantic compliance with injected instructions is consistent across models

03

Existing defenses fail to reliably detect malicious instructions

Abstract

High-privilege LLM agents that autonomously process external documentation are increasingly trusted to automate tasks by reading and executing project instructions, yet they are granted terminal access, filesystem control, and outbound network connectivity with minimal security oversight. We identify and systematically measure a fundamental vulnerability in this trust model, which we term the \emph{Trusted Executor Dilemma}: agents execute documentation-embedded instructions, including adversarial ones, at high rates because they cannot distinguish malicious directives from legitimate setup guidance. This vulnerability is a structural consequence of the instruction-following design paradigm, not an implementation bug. To structure our measurement, we formalize a three-dimensional taxonomy covering linguistic disguise, structural obfuscation, and semantic abstraction, and construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Access Control and Trust · Advanced Malware Detection Techniques