A Watermark for Large Language Models

John Kirchenbauer; Jonas Geiping; Yuxin Wen; Jonathan Katz; Ian Miers,; Tom Goldstein

arXiv:2301.10226·cs.LG·May 3, 2024·113 cites

A Watermark for Large Language Models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers,, Tom Goldstein

PDF

Open Access 5 Repos 3 Models 1 Datasets 1 Video

TL;DR

This paper introduces a novel watermarking technique for large language models that embeds detectable signals into generated text with minimal quality impact, enabling secure and efficient identification of model outputs.

Contribution

It presents a new watermarking framework that is easy to detect, does not require model access, and is robust against attacks, advancing proprietary model security.

Findings

01

Watermark detection is efficient and does not need model API access.

02

The watermark maintains high text quality with negligible impact.

03

The framework is tested successfully on large-scale models from the OPT family.

Abstract

Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

society-ethics/BlogPostOpenness
dataset· 24 dl
24 dl

Videos

A Watermark for Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning

MethodsMulti-Head Attention · Attention Is All You Need · Test · Dense Connections · Adam · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Absolute Position Encodings · Dropout