LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps

Pedro Abdalla; Roman Vershynin

arXiv:2505.01484·cs.CR·June 26, 2025

LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps

Pedro Abdalla, Roman Vershynin

PDF

Open Access

TL;DR

This paper introduces new watermarking schemes for large language models that are undetectable and unremovable, addressing both closed and open access scenarios to distinguish AI-generated text from human writing.

Contribution

It presents novel watermarking methods for LLMs that are effective even when adversaries have extensive model access, advancing the security and detectability of AI-generated content.

Findings

01

Proposed an undetectable watermarking scheme for closed settings.

02

Developed an unremovable watermarking scheme for open settings.

03

Addresses the challenge of watermark robustness against adversaries with model access.

Abstract

Given a text, can we determine whether it was generated by a large language model (LLM) or by a human? A widely studied approach to this problem is watermarking. We propose an undetectable and elementary watermarking scheme in the closed setting. Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques