A Watermark for Black-Box Language Models

Dara Bahri; John Wieting

arXiv:2410.02099·cs.CR·February 24, 2026

A Watermark for Black-Box Language Models

Dara Bahri, John Wieting

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel black-box watermarking scheme for large language models that requires only sequence sampling access, offering distortion-free detection, chaining capabilities, and outperforming some white-box methods in experiments.

Contribution

A new watermarking method for LLMs that operates with black-box access, providing performance guarantees and flexibility for chaining and nested applications.

Findings

01

Effective black-box watermark detection demonstrated

02

Outperforms some white-box schemes in experiments

03

Supports chaining and nested watermarking

Abstract

Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require white-box access to the model's next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. black-box access), boasts a distortion-free property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 1Confidence 5

Strengths

- While it is based on a generalization of ideas from existing schemes, the exact scheme proposed is to the best of my knowledge novel. The authors do a good job of exploring different variants of the scheme (e.g., CDF) in a principled way. - The theoretical results are sound. I especially appreciate that Theorem 4.2 is carefully placed into context and analyzed for various input values to demonstrate its implications. - Experiments are very thorough, involve important aspects such as quality

Weaknesses

As a meta point, the authors are using the 2024 style file and should update it to the latest version to avoid desk rejection. I understand that this is an honest mistake, but in particular the lack of usual line numbers is making it hard to refer to particular parts of the writeup. The weaknesses of the paper are in my view: (1) Limitations of the evaluation setup - The authors recognize that AUC is not the most practically relevant metric yet resolve this by proposing a new metric (AUC below

Reviewer 02Rating 3Confidence 3

Strengths

The paper seems to do a good job of optimizing both their scheme, and the schemes they compare against. In particular, it is interesting that making the watermark detector of Aaronson length-aware improves performance as much as it does.

Weaknesses

The ideas and method are straightforward adaptations of existing work. The technique is essentially identical to Aaronson's, except that they use rejection sampling instead of the Gumbel-max trick. The scheme is also only distortion-free under certain assumptions about the text, which essentially translate to it having consistently high entropy.

Reviewer 03Rating 5Confidence 3

Strengths

The method is effective in a black-box setting. It only requires to sample sequences from LLMs. The paper provides formal guarantees for detection performance.

Weaknesses

The paper’s motivation could be articulated more clearly. The main motivation stems from the security risks associated with providing API access that exposes logits to third-party users for applying their own watermark. However, simpler methods could enhance security; for instance, instead of exposing logits, LLMs could offer APIs to gather specific information users want to integrate. Furthermore, the paper presents a zero-bit watermarking technique, which only detects whether a text is waterma

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Cryptography and Data Security