A Unified Framework for LLM Watermarks
Thibaud Gloaguen, Robin Staab, Nikola Jovanovi\'c, Martin Vechev

TL;DR
This paper introduces a unified, principled framework for LLM watermarks based on constrained optimization, enabling better understanding, comparison, and design of watermarking schemes with improved detection power.
Contribution
It provides the first general formulation for LLM watermarking, unifies existing methods, and offers a systematic approach for designing new schemes tailored to specific needs.
Findings
Most watermarking schemes can be derived from the proposed optimization framework.
The framework reveals a trade-off between quality, diversity, and power.
Watermarks designed within this framework maximize detection power under given constraints.
Abstract
LLM watermarks allow tracing AI-generated texts by inserting a detectable signal into their generated content. Recent works have proposed a wide range of watermarking algorithms, each with distinct designs, usually built using a bottom-up approach. Crucially, there is no general and principled formulation for LLM watermarking. In this work, we show that most existing and widely used watermarking schemes can in fact be derived from a principled constrained optimization problem. Our formulation unifies existing watermarking methods and explicitly reveals the constraints that each method optimizes. In particular, it highlights an understudied quality-diversity-power trade-off. At the same time, our framework also provides a principled approach for designing novel watermarking schemes tailored to specific requirements. For instance, it allows us to directly use perplexity as a proxy for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
