Exposing Privacy Risks in Graph Retrieval-Augmented Generation
Jiale Liu, Jiahao Zhang, Suhang Wang

TL;DR
This paper uncovers privacy vulnerabilities in Graph Retrieval-Augmented Generation systems, showing they are more susceptible to structured data leaks despite reducing raw text exposure, and discusses potential defenses.
Contribution
It provides the first comprehensive analysis of privacy risks specific to Graph RAG systems and proposes initial defense strategies.
Findings
Graph RAG systems leak structured entity and relationship data.
Reduced raw text leakage in Graph RAG does not imply overall privacy security.
Tailored data extraction attacks effectively expose vulnerabilities.
Abstract
Retrieval-Augmented Generation (RAG) is a powerful technique for enhancing Large Language Models (LLMs) with external, up-to-date knowledge. Graph RAG has emerged as an advanced paradigm that leverages graph-based knowledge structures to provide more coherent and contextually rich answers. However, the move from plain document retrieval to structured graph traversal introduces new, under-explored privacy risks. This paper investigates the data extraction vulnerabilities of the Graph RAG systems. We design and execute tailored data extraction attacks to probe their susceptibility to leaking both raw text and structured data, such as entities and their relationships. Our findings reveal a critical trade-off: while Graph RAG systems may reduce raw text leakage, they are significantly more vulnerable to the extraction of structured entity and relationship information. We also explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
