LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps
Pedro Abdalla, Roman Vershynin

TL;DR
This paper introduces new watermarking schemes for large language models that are undetectable and unremovable, addressing both closed and open access scenarios to distinguish AI-generated text from human writing.
Contribution
It presents novel watermarking methods for LLMs that are effective even when adversaries have extensive model access, advancing the security and detectability of AI-generated content.
Findings
Proposed an undetectable watermarking scheme for closed settings.
Developed an unremovable watermarking scheme for open settings.
Addresses the challenge of watermark robustness against adversaries with model access.
Abstract
Given a text, can we determine whether it was generated by a large language model (LLM) or by a human? A widely studied approach to this problem is watermarking. We propose an undetectable and elementary watermarking scheme in the closed setting. Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques
