Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game
Yuanbo Xie, Yingjie Zhang, Yulin Li, Shouyou Song, Xiaokun Chen, Zhihan Liu, Liya Su, Tingwen Liu

TL;DR
This paper introduces CanaryRAG, a runtime integrity defense for RAG systems that detects knowledge base leakage attacks by embedding canary tokens, effectively preventing proprietary data disclosure during retrieval-augmented generation.
Contribution
It proposes CanaryRAG, a novel, plug-and-play runtime defense mechanism using canary tokens and a dual-path game model to detect and prevent RAG extraction attacks.
Findings
CanaryRAG significantly reduces chunk recovery rates compared to baselines.
It imposes negligible impact on task performance and inference latency.
CanaryRAG can be integrated into existing RAG pipelines without retraining.
Abstract
Retrieval-Augmented Generation (RAG) systems augment large language models with external knowledge, yet introduce a critical security vulnerability: RAG Knowledge Base Leakage, wherein adversarial prompts can induce the model to divulge retrieved proprietary content. Recent studies reveal that such leakage can be executed through adaptive and iterative attack strategies (named RAG extraction attack), while effective countermeasures remain notably lacking. To bridge this gap, we propose CanaryRAG, a runtime defense mechanism inspired by stack canaries in software security. CanaryRAG embeds carefully designed canary tokens into retrieved chunks and reformulates RAG extraction defense as a dual-path runtime integrity game. Leakage is detected in real time whenever either the target or oracle path violates its expected canary behavior, including under adaptive suppression and obfuscation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
