Provably Secure Retrieval-Augmented Generation
Pengcheng Zhou, Yinglun Feng, Zhongliang Yang

TL;DR
This paper introduces SAG, a provably secure framework for Retrieval-Augmented Generation systems that guarantees data confidentiality and integrity through formal security proofs and encryption, addressing privacy and security risks.
Contribution
It presents the first formal security framework for RAG systems, employing encryption and rigorous proofs to ensure data security against various attacks.
Findings
Effective resistance to state-of-the-art attacks
Formal security guarantees under a computational model
Demonstrated robustness across multiple datasets
Abstract
Although Retrieval-Augmented Generation (RAG) systems have been widely applied, the privacy and security risks they face, such as data leakage and data poisoning, have not been systematically addressed yet. Existing defense strategies primarily rely on heuristic filtering or enhancing retriever robustness, which suffer from limited interpretability, lack of formal security guarantees, and vulnerability to adaptive attacks. To address these challenges, this paper proposes the first provably secure framework for RAG systems(SAG). Our framework employs a pre-storage full-encryption scheme to ensure dual protection of both retrieved content and vector embeddings, guaranteeing that only authorized entities can access the data. Through formal security proofs, we rigorously verify the scheme's confidentiality and integrity under a computational security model. Extensive experiments across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Advanced Steganography and Watermarking Techniques
