Watermarks for Language Models via Probabilistic Automata

Yangkun Wang; Jingbo Shang

arXiv:2512.10185·cs.CR·December 12, 2025

Watermarks for Language Models via Probabilistic Automata

Yangkun Wang, Jingbo Shang

PDF

Open Access

TL;DR

This paper introduces a novel watermarking scheme for language models using probabilistic automata, achieving high diversity, efficiency, and undetectability, validated through extensive experiments on large models.

Contribution

It presents a new class of watermarking schemes with practical and theoretical variants, improving diversity, robustness, and undetectability over existing methods.

Findings

01

Exponential generation diversity achieved

02

High robustness demonstrated on large models

03

Scheme offers formal undetectability guarantees

Abstract

A recent watermarking scheme for language models achieves distortion-free embedding and robustness to edit-distance attacks. However, it suffers from limited generation diversity and high detection overhead. In parallel, recent research has focused on undetectability, a property ensuring that watermarks remain difficult for adversaries to detect and spoof. In this work, we introduce a new class of watermarking schemes constructed through probabilistic automata. We present two instantiations: (i) a practical scheme with exponential generation diversity and computational efficiency, and (ii) a theoretical construction with formal undetectability guarantees under cryptographic assumptions. Extensive experiments on LLaMA-3B and Mistral-7B validate the superior performance of our scheme in terms of robustness and efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Formal Methods in Verification · Machine Learning and Algorithms