Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation

Yufei Chen; Yao Wang; Haibin Zhang; Tao Gu

arXiv:2507.23229·cs.CR·November 25, 2025

Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation

Yufei Chen, Yao Wang, Haibin Zhang, Tao Gu

PDF

Open Access

TL;DR

This paper introduces a novel black-box attack framework exploiting knowledge asymmetry to accurately extract private information from RAG systems across multiple domains, highlighting privacy vulnerabilities.

Contribution

It presents a new adaptive attack method that improves fine-grained privacy extraction from RAG systems without domain-specific pre-training.

Findings

01

Achieves over 91% privacy extraction rate in single-domain scenarios.

02

Reduces sensitive sentence exposure by over 65% in case studies.

03

Generalizes to unseen domains through iterative refinement.

Abstract

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge bases, but this advancement introduces significant privacy risks. Existing privacy attacks on RAG systems can trigger data leakage but often fail to accurately isolate knowledge-base-derived sentences within mixed responses. They also lack robustness when applied across multiple domains. This paper addresses these challenges by presenting a novel black-box attack framework that exploits knowledge asymmetry between RAG and standard LLMs to achieve fine-grained privacy extraction across heterogeneous knowledge landscapes. We propose a chain-of-thought reasoning strategy that creates adaptive prompts to steer RAG systems away from sensitive content. Specifically, we first decompose adversarial queries to maximize information disparity and then apply a semantic relationship…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data