Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models
Cody Clop, Yannick Teglia

TL;DR
This paper explores vulnerabilities in Retrieval Augmented Generation systems, revealing how corpus poisoning and backdoor attacks can manipulate LLM outputs for malicious purposes.
Contribution
It introduces a novel backdoor attack method targeting the retriever component of RAG systems, demonstrating higher success rates than existing poisoning techniques.
Findings
Corpus poisoning can significantly compromise RAG systems.
Backdoor attacks achieve higher success rates with complex setups.
Attacks can insert harmful or misleading content into generated outputs.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent text but remain limited by the static nature of their training data. Retrieval Augmented Generation (RAG) addresses this issue by combining LLMs with up-to-date information retrieval, but also expand the attack surface of the system. This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation, such as inserting harmful links, promoting unauthorized services, and initiating denial-of-service behaviors. We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component. Our experiments reveal that corpus poisoning can achieve significant attack success rates through the injection of a small number of compromised documents into the retriever corpus. In contrast,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dropout · Byte Pair Encoding · Dense Connections · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · BART · Weight Decay
