Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning

Junfeng Guo; Yiming Li; Ruibo Chen; Yihan Wu; Chenxi Liu; Yanshuo Chen; Heng Huang

arXiv:2502.10440·cs.CR·May 26, 2025

Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning

Junfeng Guo, Yiming Li, Ruibo Chen, Yihan Wu, Chenxi Liu, Yanshuo Chen, Heng Huang

PDF

Open Access

TL;DR

This paper introduces ame{}, a novel method for copyright protection of knowledge bases in retrieval-augmented language models by embedding benign verification behaviors in reasoning processes, ensuring security without altering final answers.

Contribution

The paper proposes a new watermarking technique that embeds verification behaviors in reasoning steps, maintaining answer correctness and resisting adaptive attacks.

Findings

01

Effective protection of knowledge bases demonstrated across benchmarks.

02

High resistance to adaptive and anomaly detection attacks.

03

Maintains answer accuracy while embedding watermarks.

Abstract

Large language models (LLMs) are increasingly integrated into real-world personalized applications through retrieval-augmented generation (RAG) mechanisms to supplement their responses with domain-specific knowledge. However, the valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries. Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks. However, these methods require altering the LLM's results of verification samples, inevitably making these watermarks susceptible to anomaly detection and even introducing new security risks. To address these challenges, we propose \name{} for `harmless' copyright protection of knowledge bases. Instead of manipulating LLM's final output, \name{} implants distinct yet benign…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Byte Pair Encoding · Adam · Softmax · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · BART · WordPiece · Layer Normalization