Watermarking Low-entropy Generation for Large Language Models: An   Unbiased and Low-risk Method

Minjia Mao; Dongjun Wei; Zeyu Chen; Xiao Fang; Michael Chau

arXiv:2405.14604·cs.CL·February 11, 2025·1 cites

Watermarking Low-entropy Generation for Large Language Models: An Unbiased and Low-risk Method

Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, Michael Chau

PDF

Open Access 1 Repo

TL;DR

This paper introduces STA-1, a novel unbiased watermarking method for large language models that maintains output quality, is efficient, robust, and suitable for low-entropy scenarios, enhancing detection without requiring access to the model's internals.

Contribution

The paper proposes STA-1, a new unbiased watermarking technique that preserves token distribution, improves low-entropy output quality, and offers efficient, robust detection without white-box access.

Findings

01

STA-1 maintains original token distribution in expectation.

02

STA-1 demonstrates high detection efficiency and robustness.

03

Experimental results confirm STA-1's effectiveness across datasets.

Abstract

Recent advancements in large language models (LLMs) have highlighted the risk of misusing them, raising the need for accurate detection of LLM-generated content. In response, a viable solution is to inject imperceptible identifiers into LLMs, known as watermarks. Our research extends the existing watermarking methods by proposing the novel Sampling One Then Accepting (STA-1) method. STA-1 is an unbiased watermark that preserves the original token distribution in expectation and has a lower risk of producing unsatisfactory outputs in low-entropy scenarios compared to existing unbiased watermarks. In watermark detection, STA-1 does not require prompts or a white-box LLM, provides statistical guarantees, demonstrates high efficiency in detection time, and remains robust against various watermarking attacks. Experimental results on low-entropy and high-entropy datasets demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

djwei96/sta
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling