RAG-WM: An Efficient Black-Box Watermarking Approach for   Retrieval-Augmented Generation of Large Language Models

Peizhuo Lv; Mengjie Sun; Hao Wang; Xiaofeng Wang; Shengzhi Zhang,; Yuxuan Chen; Kai Chen; Limin Sun

arXiv:2501.05249·cs.CR·January 10, 2025

RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models

Peizhuo Lv, Mengjie Sun, Hao Wang, Xiaofeng Wang, Shengzhi Zhang,, Yuxuan Chen, Kai Chen, Limin Sun

PDF

Open Access

TL;DR

This paper introduces RAG-WM, a novel black-box watermarking method for retrieval-augmented generation systems that effectively detects intellectual property theft across various large language models and withstands common attacks.

Contribution

It proposes a new black-box watermarking approach for RAGs using multi-LLM interactions, addressing limitations of existing white-box and text-based watermarks.

Findings

01

RAG-WM effectively detects stolen RAGs across multiple LLMs.

02

It is robust against paraphrasing and content modification attacks.

03

RAG-WM can evade existing watermark detection methods.

Abstract

In recent years, tremendous success has been witnessed in Retrieval-Augmented Generation (RAG), widely used to enhance Large Language Models (LLMs) in domain-specific, knowledge-intensive, and privacy-sensitive tasks. However, attackers may steal those valuable RAGs and deploy or commercialize them, making it essential to detect Intellectual Property (IP) infringement. Most existing ownership protection solutions, such as watermarks, are designed for relational databases and texts. They cannot be directly applied to RAGs because relational database watermarks require white-box access to detect IP infringement, which is unrealistic for the knowledge base in RAGs. Meanwhile, post-processing by the adversary's deployed LLMs typically destructs text watermark information. To address those problems, we propose a novel black-box "knowledge watermark" approach, named RAG-WM, to detect IP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Generative Adversarial Networks and Image Synthesis · Vehicle License Plate Recognition

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout