Provably Secure Retrieval-Augmented Generation

Pengcheng Zhou; Yinglun Feng; Zhongliang Yang

arXiv:2508.01084·cs.CR·August 5, 2025

Provably Secure Retrieval-Augmented Generation

Pengcheng Zhou, Yinglun Feng, Zhongliang Yang

PDF

Open Access

TL;DR

This paper introduces SAG, a provably secure framework for Retrieval-Augmented Generation systems that guarantees data confidentiality and integrity through formal security proofs and encryption, addressing privacy and security risks.

Contribution

It presents the first formal security framework for RAG systems, employing encryption and rigorous proofs to ensure data security against various attacks.

Findings

01

Effective resistance to state-of-the-art attacks

02

Formal security guarantees under a computational model

03

Demonstrated robustness across multiple datasets

Abstract

Although Retrieval-Augmented Generation (RAG) systems have been widely applied, the privacy and security risks they face, such as data leakage and data poisoning, have not been systematically addressed yet. Existing defense strategies primarily rely on heuristic filtering or enhancing retriever robustness, which suffer from limited interpretability, lack of formal security guarantees, and vulnerability to adaptive attacks. To address these challenges, this paper proposes the first provably secure framework for RAG systems(SAG). Our framework employs a pre-storage full-encryption scheme to ensure dual protection of both retrieved content and vector embeddings, guaranteeing that only authorized entities can access the data. Through formal security proofs, we rigorously verify the scheme's confidentiality and integrity under a computational security model. Extensive experiments across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Advanced Steganography and Watermarking Techniques