Differentially Private Retrieval-Augmented Generation
Tingting Tang, James Flemings, Yongqin Wang, Murali Annavaram

TL;DR
This paper introduces DP-KSA, a novel differentially private retrieval-augmented generation method that enhances privacy in LLMs while maintaining utility, especially for sensitive domain-specific data.
Contribution
DP-KSA integrates differential privacy into RAG by using a propose-test-release paradigm to preserve utility and privacy, addressing utility degradation issues in prior approaches.
Findings
DP-KSA provides formal differential privacy guarantees.
Empirical results show strong privacy-utility tradeoff.
Effective in domain-specific question-answering tasks.
Abstract
Retrieval-augmented generation (RAG) is a widely used framework for reducing hallucinations in large language models (LLMs) on domain-specific tasks by retrieving relevant documents from a database to support accurate responses. However, when the database contains sensitive corpora, such as medical records or legal documents, RAG poses serious privacy risks by potentially exposing private information through its outputs. Prior work has demonstrated that one can practically craft adversarial prompts that force an LLM to regurgitate the augmented contexts. A promising direction is to integrate differential privacy (DP), a privacy notion that offers strong formal guarantees, into RAG systems. However, naively applying DP mechanisms into existing systems often leads to significant utility degradation. Particularly for RAG systems, DP can reduce the usefulness of the augmented contexts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Privacy-Preserving Technologies in Data · Advanced Graph Neural Networks
