Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors

Yuefeng Peng; Junda Wang; Hong Yu; Amir Houmansadr

arXiv:2411.01705·cs.CR·April 1, 2025

Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors

Yuefeng Peng, Junda Wang, Hong Yu, Amir Houmansadr

PDF

Open Access

TL;DR

This paper reveals vulnerabilities in Retrieval-Augmented Generation systems by demonstrating how backdoor attacks can enable extraction of sensitive knowledge, highlighting privacy risks in deploying such models.

Contribution

It introduces a novel backdoor attack method on RAG systems through poisoned fine-tuning data, significantly improving document extraction success rates.

Findings

01

Backdoor attacks achieve 94.1% success in verbatim extraction.

02

Fine-tuning reduces effectiveness of prompt injection attacks.

03

Poisoned data enables extraction of both verbatim and paraphrased documents.

Abstract

Despite significant advancements, large language models (LLMs) still struggle with providing accurate answers when lacking domain-specific or up-to-date knowledge. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge bases, but it also introduces new attack surfaces. In this paper, we investigate data extraction attacks targeting RAG's knowledge databases. We show that previous prompt injection-based extraction attacks largely rely on the instruction-following capabilities of LLMs. As a result, they fail on models that are less responsive to such malicious prompts -- for example, our experiments show that state-of-the-art attacks achieve near-zero success on Gemma-2B-IT. Moreover, even for models that can follow these instructions, we found fine-tuning may significantly reduce attack performance. To further reveal the vulnerability, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Security and Verification in Computing · Adversarial Robustness in Machine Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Softmax · Dropout · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · WordPiece · Adam · Attention Is All You Need