Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs

Yepeng Liu; Xuandong Zhao; Dawn Song; Yuheng Bu

arXiv:2502.10673·cs.CR·February 18, 2025

Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs

Yepeng Liu, Xuandong Zhao, Dawn Song, Yuheng Bu

PDF

Open Access 3 Reviews

TL;DR

This paper proposes a novel watermarking method using watermarked canary documents to protect dataset ownership in Retrieval-Augmented LLMs, enabling effective detection of unauthorized dataset usage without impairing system performance.

Contribution

It introduces a new watermarking approach with synthetic canary documents for dataset protection in RAG systems, ensuring stealthiness, detectability, and minimal data perturbation.

Findings

01

High query efficiency in detecting unauthorized use

02

Watermarked canaries are stealthy and statistically provable

03

Minimal impact on RAG system performance

Abstract

Retrieval-Augmented Generation (RAG) has become an effective method for enhancing large language models (LLMs) with up-to-date knowledge. However, it poses a significant risk of IP infringement, as IP datasets may be incorporated into the knowledge database by malicious Retrieval-Augmented LLMs (RA-LLMs) without authorization. To protect the rights of the dataset owner, an effective dataset membership inference algorithm for RA-LLMs is needed. In this work, we introduce a novel approach to safeguard the ownership of text datasets and effectively detect unauthorized use by the RA-LLMs. Our approach preserves the original data completely unchanged while protecting it by inserting specifically designed canary documents into the IP dataset. These canary documents are created with synthetic content and embedded watermarks to ensure uniqueness, stealthiness, and statistical provability.…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. address a timely problem for protecting the intellictual property for external knowledge base used for the RAG system. 2. The results evaluated on various dataset is really good. 3. The presentation is easy to understand.

Weaknesses

1. There is no thereotical guarantee that this unique (e.g., "VitalityBoost and ExerciseShield " ) will trigger the retrieval of systhetic paragraph. For example, what happens if these unique content (e.g., "VitalityBoost and ExerciseShield " ) will exist in other external knowledge bases which would be combined into the protected dataset. 2. How do you verify the ownership with answers containinng the sythentic content. It is possible that the user claim that their knowledge base contains t

Reviewer 02Rating 4Confidence 3

Strengths

- The paper introduces a black-box dataset protection framework that preserves the integrity of the original IP dataset while achieving high detection accuracy through LLM-based watermarking in synthetic canary documents - The authors conduct extensive experiments demonstrating strong quantitative performance, including high retrieval accuracy with minimal queries and negligible impact on downstream RAG tasks

Weaknesses

- The proposed method lacks clear novelty. Its core idea primarily relies on applying existing watermarking techniques within the RAG framework. While the implementation is well-executed, the approach essentially extends known watermarking methods to a familiar setting without introducing fundamentally new algorithms or theoretical insights. - The method used to detect the watermark also lacks clear novelty. This paper primarily applies an existing detection algorithm to detect the watermark.

Reviewer 03Rating 6Confidence 4

Strengths

1.The DMI-RAG task is well-posed, which keeps the original dataset untouched, avoids quality regressions common to paraphrase-based watermarking, and canaries provide separable evidence 2.The work provides a principled synthesis pipeline. 3.Detection with statistical guarantees. 4.Extensive experiments and strong empirical results under realistic constraints.

Weaknesses

1. More broader evaluation of robustness should be done. How does detection performance change if the generator performs an aggressive attack, e.g., paraphrasing, before answering? 2. The experiments show the effectiveness of the watermarked canary on text RAG. Can you discuss its potential application to non-text RAG, e.g., image?

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies

MethodsAttention Is All You Need · Byte Pair Encoding · Adam · Softmax · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Attention Dropout · BART · Layer Normalization