Watermarking Language Models for Many Adaptive Users
Aloni Cohen, Alexander Hoover, Gabe Schoenbach

TL;DR
This paper introduces multi-user watermarking schemes for language models that enable tracing generated text to individual users or groups, even under adaptive prompting, enhancing robustness and privacy protections.
Contribution
It presents the first generic reduction from zero-bit to multi-user watermarking schemes and introduces AEB-robustness as a new abstraction for robustness against edits.
Findings
The scheme detects shorter snippets as effectively as previous methods.
It can trace longer excerpts to individual users.
The zero-bit watermarking scheme of Christ, Gunn, and Zamir (2024) is proven to be adaptively robust.
Abstract
We study watermarking schemes for language models with provable guarantees. As we show, prior works offer no robustness guarantees against adaptive prompting: when a user queries a language model more than once, as even benign users do. And with just a single exception (Christ and Gunn, 2024), prior works are restricted to zero-bit watermarking: machine-generated text can be detected as such, but no additional information can be extracted from the watermark. Unfortunately, merely detecting AI-generated text may not prevent future abuses. We introduce multi-user watermarks, which allow tracing model-generated text to individual users or to groups of colluding users, even in the face of adaptive prompting. We construct multi-user watermarking schemes from undetectable, adaptively robust, zero-bit watermarking schemes (and prove that the undetectable zero-bit scheme of Christ, Gunn, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling
