Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Michael-Andrei Panaitescu-Liess; Zora Che; Bang An; Yuancheng Xu; Pankayaraj Pathmanathan; Souradip Chakraborty; Sicheng Zhu; Tom Goldstein; Furong Huang

arXiv:2407.17417·cs.LG·June 6, 2025

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang

PDF

TL;DR

This paper explores watermarking techniques in large language models to prevent copyright infringement, analyzing their effectiveness and impact on privacy attacks, and proposing adaptive methods to improve detection under watermarking.

Contribution

It provides a theoretical and empirical evaluation of watermarking in LLMs, revealing its effects on copyright protection and membership inference attacks, and introduces an adaptive technique to enhance attack success.

Findings

01

Watermarking reduces copyrighted text generation in LLMs.

02

Watermarking hampers the effectiveness of membership inference attacks.

03

Adaptive methods can improve attack success rates under watermarking.

Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in generating diverse and contextually rich text. However, concerns regarding copyright infringement arise as LLMs may inadvertently produce copyrighted material. In this paper, we first investigate the effectiveness of watermarking LLMs as a deterrent against the generation of copyrighted texts. Through theoretical analysis and empirical evaluation, we demonstrate that incorporating watermarks into LLMs significantly reduces the likelihood of generating copyrighted content, thereby addressing a critical concern in the deployment of LLMs. However, we also find that watermarking can have unintended consequences on Membership Inference Attacks (MIAs), which aim to discern whether a sample was part of the pretraining dataset and may be used to detect copyright violations. Surprisingly, we find that watermarking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.