Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors
Yuefeng Peng, Junda Wang, Hong Yu, Amir Houmansadr

TL;DR
This paper reveals vulnerabilities in Retrieval-Augmented Generation systems by demonstrating how backdoor attacks can enable extraction of sensitive knowledge, highlighting privacy risks in deploying such models.
Contribution
It introduces a novel backdoor attack method on RAG systems through poisoned fine-tuning data, significantly improving document extraction success rates.
Findings
Backdoor attacks achieve 94.1% success in verbatim extraction.
Fine-tuning reduces effectiveness of prompt injection attacks.
Poisoned data enables extraction of both verbatim and paraphrased documents.
Abstract
Despite significant advancements, large language models (LLMs) still struggle with providing accurate answers when lacking domain-specific or up-to-date knowledge. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge bases, but it also introduces new attack surfaces. In this paper, we investigate data extraction attacks targeting RAG's knowledge databases. We show that previous prompt injection-based extraction attacks largely rely on the instruction-following capabilities of LLMs. As a result, they fail on models that are less responsive to such malicious prompts -- for example, our experiments show that state-of-the-art attacks achieve near-zero success on Gemma-2B-IT. Moreover, even for models that can follow these instructions, we found fine-tuning may significantly reduce attack performance. To further reveal the vulnerability, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Security and Verification in Computing · Adversarial Robustness in Machine Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Softmax · Dropout · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · WordPiece · Adam · Attention Is All You Need
