You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents
Ching-Yu Kao, Xinfeng Li, Shenyu Dai, Tianze Qiu, Pengcheng Zhou, Eric Hanchen Jiang, Philip Sperl

TL;DR
This paper exposes a fundamental security vulnerability in high-privilege LLM agents caused by their inability to distinguish malicious instructions embedded in documentation, leading to high success rates of data exfiltration and highlighting a significant safety gap.
Contribution
The paper formalizes the Trusted Executor Dilemma, introduces ReadSecBench for measuring instruction-induced data leakage, and demonstrates the vulnerability's severity across models and defenses.
Findings
End-to-end exfiltration success rates up to 85%
Semantic compliance with injected instructions is consistent across models
Existing defenses fail to reliably detect malicious instructions
Abstract
High-privilege LLM agents that autonomously process external documentation are increasingly trusted to automate tasks by reading and executing project instructions, yet they are granted terminal access, filesystem control, and outbound network connectivity with minimal security oversight. We identify and systematically measure a fundamental vulnerability in this trust model, which we term the \emph{Trusted Executor Dilemma}: agents execute documentation-embedded instructions, including adversarial ones, at high rates because they cannot distinguish malicious directives from legitimate setup guidance. This vulnerability is a structural consequence of the instruction-following design paradigm, not an implementation bug. To structure our measurement, we formalize a three-dimensional taxonomy covering linguistic disguise, structural obfuscation, and semantic abstraction, and construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Access Control and Trust · Advanced Malware Detection Techniques
