MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction
Xiao Hu, Eric Liu, Weizhou Wang, Xiangyu Guo, David Lie

TL;DR
This paper introduces MARAGE, a transferable adversarial attack framework that optimizes prompts to extract private data from Retrieval-Augmented Generation systems, outperforming existing methods across multiple models.
Contribution
MARAGE presents a novel gradient-based optimization approach that enhances attack transferability and effectiveness in extracting data from RAG systems, even on unseen models.
Findings
MARAGE outperforms manual and baseline attacks across multiple models.
The optimized prompts achieve high transferability to unseen models.
Probing reveals insights into why MARAGE is more effective.
Abstract
Retrieval-Augmented Generation (RAG) offers a solution to mitigate hallucinations in Large Language Models (LLMs) by grounding their outputs to knowledge retrieved from external sources. The use of private resources and data in constructing these external data stores can expose them to risks of extraction attacks, in which attackers attempt to steal data from these private databases. Existing RAG extraction attacks often rely on manually crafted prompts, which limit their effectiveness. In this paper, we introduce a framework called MARAGE for optimizing an adversarial string that, when appended to user queries submitted to a target RAG system, causes outputs containing the retrieved RAG data verbatim. MARAGE leverages a continuous optimization scheme that integrates gradients from multiple models with different architectures simultaneously to enhance the transferability of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsAttention Is All You Need · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · WordPiece · Layer Normalization · Residual Connection · Dense Connections
