TL;DR
LeakDojo is a framework for systematically evaluating leakage risks in Retrieval-Augmented Generation systems, revealing how query generation, instructions, and model capabilities influence data leakage.
Contribution
It introduces LeakDojo, a configurable tool for benchmarking RAG leakage, and provides insights into factors affecting leakage risks in LLM-based retrieval systems.
Findings
Query generation and adversarial instructions independently increase leakage.
Stronger instruction-following models have higher leakage risk.
Improving RAG faithfulness can lead to increased leakage.
Abstract
Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to leverage external knowledge, but also exposes valuable RAG databases to leakage attacks. As RAG systems grow more complex and LLMs exhibit stronger instruction-following capabilities, existing studies fall short of systematically assessing RAG leakage risks. We present LeakDojo, a configurable framework for controlled evaluation of RAG leakage. Using LeakDojo, we benchmark six existing attacks across fourteen LLMs, four datasets, and diverse RAG systems. Our study reveals that (1) query generation and adversarial instructions contribute independently to leakage, with overall leakage well approximated by their product; (2) stronger instruction-following capability correlates with higher leakage risk; and (3) improvements in RAG faithfulness can introduce increased leakage risk. These findings provide actionable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
