Publicly-Detectable Watermarking for Language Models
Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad, Mahmoody, Mingyuan Wang

TL;DR
This paper introduces a publicly-detectable watermarking method for language models that embeds cryptographic signatures into outputs, ensuring unforgeability and undetectability, even during low-entropy periods, with practical implementation confirming theoretical claims.
Contribution
The paper proposes a novel watermarking scheme for language models that is publicly verifiable, unforgeable, and resilient to low-entropy challenges, with practical implementation.
Findings
Watermarking scheme is unforgeable and distortion-free.
The scheme is effective even during low-entropy periods.
Practical implementation confirms theoretical properties.
Abstract
We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Hate Speech and Cyberbullying Detection · Privacy-Preserving Technologies in Data
