No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices
Qi Pang, Shengyuan Hu, Wenting Zheng, Virginia Smith

TL;DR
This paper demonstrates that designing watermarking schemes for large language models involves fundamental trade-offs between robustness, utility, and usability, and provides guidelines and defenses to address these challenges.
Contribution
The paper reveals inherent trade-offs in LLM watermarking design and offers practical guidelines and defenses to improve watermark robustness against attacks.
Findings
Common watermarking schemes are vulnerable to simple attacks.
Trade-offs exist between robustness, utility, and usability in watermark design.
Proposed defenses improve watermark resilience in practice.
Abstract
Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack -- leading to fundamental trade-offs in robustness, utility, and usability. To navigate these trade-offs, we rigorously study a set of simple yet effective attacks on common watermarking systems, and propose guidelines and defenses for LLM watermarking in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCryptography and Data Security · Digital Rights Management and Security · Advanced Data Storage Technologies
MethodsSparse Evolutionary Training
