Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach
Haiyun He, Yepeng Liu, Ziqiao Wang, Yongyi Mao, Yuheng Bu

TL;DR
This paper introduces a theoretically grounded, distribution-adaptive watermarking framework for LLMs that jointly optimizes watermark scheme and detector, ensuring high detection accuracy with minimal text distortion.
Contribution
It presents a unified theoretical framework for LLM watermarking, deriving optimal solutions and proposing a practical, distortion-free, distribution-adaptive watermarking algorithm (DAWA).
Findings
Effective at ultra-low false positive rates
Distribution-adaptive scheme improves detection performance
Validated on Llama2-13B and Mistral-8x7B models
Abstract
Watermarking has emerged as a crucial method to distinguish AI-generated text from human-created text. Current watermarking approaches often lack formal optimality guarantees or address the scheme and detector design separately. In this paper, we introduce a novel, unified theoretical framework for watermarking Large Language Models (LLMs) that jointly optimizes both the watermarking scheme and detector. Our approach aims to maximize detection performance while maintaining control over the worst-case false positive rate (FPR) and distortion on text quality. We derive closed-form optimal solutions for this joint design and characterize the fundamental trade-off between watermark detectability and distortion. Notably, we reveal that the optimal watermarking schemes should be adaptive to the LLM's generative distribution. Building on our theoretical insights, we propose a distortion-free,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Digital Rights Management and Security · Cryptography and Data Security
