Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation
Maya Anderson, Guy Amit, Abigail Goldsteen

TL;DR
This paper investigates privacy risks in Retrieval Augmented Generation systems by demonstrating effective membership inference attacks that can determine if data is in the retrieval database, and proposes initial defense strategies.
Contribution
It introduces a practical method for conducting membership inference attacks on RAG systems and evaluates defense strategies to mitigate privacy risks.
Findings
Membership inference can be efficiently performed in RAG systems.
The attack is effective in both black-box and gray-box settings.
Adding instructions to the RAG template can serve as a partial defense.
Abstract
Retrieval Augmented Generation (RAG) systems have shown great promise in natural language processing. However, their reliance on data stored in a retrieval database, which may contain proprietary or sensitive information, introduces new privacy concerns. Specifically, an attacker may be able to infer whether a certain text passage appears in the retrieval database by observing the outputs of the RAG system, an attack known as a Membership Inference Attack (MIA). Despite the significance of this threat, MIAs against RAG systems have yet remained under-explored. This study addresses this gap by introducing an efficient and easy-to-use method for conducting MIA against RAG systems. We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models, showing that the membership of a document in the retrieval database can be efficiently determined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Information Retrieval and Search Behavior
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Linear Layer · Byte Pair Encoding · Adam · Residual Connection
