Edit Distance Robust Watermarks via Indexing Pseudorandom Codes
Noah Golowich, Ankur Moitra

TL;DR
This paper introduces a new watermarking scheme for language models that is both undetectable and robust against adversarial edits, including insertions, deletions, and substitutions, using indexing pseudorandom codes with weaker assumptions.
Contribution
The paper presents a novel watermarking scheme that achieves provable undetectability and robustness to edit operations, extending previous work to handle more natural adversarial modifications.
Findings
Achieves robustness to adversarial insertions, deletions, and substitutions.
Uses indexing pseudorandom codes over large alphabets for security.
Relies on weaker computational assumptions than prior schemes.
Abstract
Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text. Earlier schemes could only handle stochastic substitutions and deletions, and thus we are aiming for a more natural and appealing robustness guarantee that holds with respect to edit distance. Our main result is a watermarking scheme which achieves both undetectability and robustness to edits when the alphabet size for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
