Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

Yuanbo Xie; Yingjie Zhang; Yulin Li; Shouyou Song; Xiaokun Chen; Zhihan Liu; Liya Su; Tingwen Liu

arXiv:2604.10717·cs.CR·April 14, 2026

Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

Yuanbo Xie, Yingjie Zhang, Yulin Li, Shouyou Song, Xiaokun Chen, Zhihan Liu, Liya Su, Tingwen Liu

PDF

TL;DR

This paper introduces CanaryRAG, a runtime integrity defense for RAG systems that detects knowledge base leakage attacks by embedding canary tokens, effectively preventing proprietary data disclosure during retrieval-augmented generation.

Contribution

It proposes CanaryRAG, a novel, plug-and-play runtime defense mechanism using canary tokens and a dual-path game model to detect and prevent RAG extraction attacks.

Findings

01

CanaryRAG significantly reduces chunk recovery rates compared to baselines.

02

It imposes negligible impact on task performance and inference latency.

03

CanaryRAG can be integrated into existing RAG pipelines without retraining.

Abstract

Retrieval-Augmented Generation (RAG) systems augment large language models with external knowledge, yet introduce a critical security vulnerability: RAG Knowledge Base Leakage, wherein adversarial prompts can induce the model to divulge retrieved proprietary content. Recent studies reveal that such leakage can be executed through adaptive and iterative attack strategies (named RAG extraction attack), while effective countermeasures remain notably lacking. To bridge this gap, we propose CanaryRAG, a runtime defense mechanism inspired by stack canaries in software security. CanaryRAG embeds carefully designed canary tokens into retrieved chunks and reformulates RAG extraction defense as a dual-path runtime integrity game. Leakage is detected in real time whenever either the target or oracle path violates its expected canary behavior, including under adaptive suppression and obfuscation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.