Publicly-Detectable Watermarking for Language Models

Jaiden Fairoze; Sanjam Garg; Somesh Jha; Saeed Mahloujifar; Mohammad; Mahmoody; Mingyuan Wang

arXiv:2310.18491·cs.LG·January 7, 2025·1 cites

Publicly-Detectable Watermarking for Language Models

Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad, Mahmoody, Mingyuan Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a publicly-detectable watermarking method for language models that embeds cryptographic signatures into outputs, ensuring unforgeability and undetectability, even during low-entropy periods, with practical implementation confirming theoretical claims.

Contribution

The paper proposes a novel watermarking scheme for language models that is publicly verifiable, unforgeable, and resilient to low-entropy challenges, with practical implementation.

Findings

01

Watermarking scheme is unforgeable and distortion-free.

02

The scheme is effective even during low-entropy periods.

03

Practical implementation confirms theoretical properties.

Abstract

We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jfairoze/publicly-detectable-watermark
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInternet Traffic Analysis and Secure E-voting · Hate Speech and Cyberbullying Detection · Privacy-Preserving Technologies in Data