A Watermark for Large Language Models
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers,, Tom Goldstein

TL;DR
This paper introduces a novel watermarking technique for large language models that embeds detectable signals into generated text with minimal quality impact, enabling secure and efficient identification of model outputs.
Contribution
It presents a new watermarking framework that is easy to detect, does not require model access, and is robust against attacks, advancing proprietary model security.
Findings
Watermark detection is efficient and does not need model API access.
The watermark maintains high text quality with negligible impact.
The framework is tested successfully on large-scale models from the OPT family.
Abstract
Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning
MethodsMulti-Head Attention · Attention Is All You Need · Test · Dense Connections · Adam · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Absolute Position Encodings · Dropout
