Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems
Hongyan Chang, Ergute Bao, Xinjian Luo, Ting Yu

TL;DR
This paper demonstrates that indirect prompt injection (IPI) attacks can reliably manipulate large language models by ensuring malicious content is retrieved, revealing a significant security vulnerability in LLM retrieval systems.
Contribution
It introduces a novel trigger-based attack method that guarantees retrieval of malicious content, and provides the first end-to-end IPI exploits under realistic conditions.
Findings
Near-100% retrieval success across multiple benchmarks and models
High attack success rate in real-world scenarios, e.g., over 80% in exfiltrating SSH keys
Existing defenses are ineffective against retrieval-based IPI attacks
Abstract
Large language models (LLMs) increasingly rely on retrieving information from external corpora. This creates a new attack surface: indirect prompt injection (IPI), where hidden instructions are planted in the corpora and hijack model behavior once retrieved. Previous studies have highlighted this risk but often avoid the hardest step: ensuring that malicious content is actually retrieved. In practice, unoptimized IPI is rarely retrieved under natural queries, which leaves its real-world impact unclear. We address this challenge by decomposing the malicious content into a trigger fragment that guarantees retrieval and an attack fragment that encodes arbitrary attack objectives. Based on this idea, we design an efficient and effective black-box attack algorithm that constructs a compact trigger fragment to guarantee retrieval for any attack fragment. Our attack requires only API access…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Spam and Phishing Detection · Adversarial Robustness in Machine Learning
