Can we Watermark Low-Entropy LLM Outputs?
Noam Mazor, Andrew Morgan, Rafael Pass

TL;DR
This paper investigates watermarking methods for low-entropy large language model outputs, aiming to embed identifiable marks without altering output distribution and ensuring robustness against various manipulations.
Contribution
It introduces watermarking schemes effective even with low per-token entropy, expanding applicability beyond previous high-entropy assumptions.
Findings
Proposes a watermarking scheme robust against random substitutions.
Develops a scheme resilient to substitutions and deletions under certain assumptions.
Extends watermarking techniques to low-entropy LLM outputs.
Abstract
A recent and exciting thread of work focuses on developing methods for watermarking the output of large language models (LLMs). We focus on provably undetectable watermarking-that is, schemes that do not alter the output distribution of the LLM, yet enable embedding a watermark in the output that identifies the output as having been generated by the particular LLM. Furthermore, the watermark should be hard to remove by an adversary that may potentially edit, insert, or delete tokens from the watermarked output. Indeed, recent work (Christ et al. [COLT'24], Christ et al. [CRYPTO'24], Golowich et al. [NeuroIPS'24]) shows how to develop such schemes that are robust against a constant fraction of substitutions, or even against a constant fraction of arbitrary edits. These works, however, make strong assumptions on the entropy present in the output of the LLM. Most notably, they all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
