Distributional Information Embedding: A Framework for Multi-bit Watermarking
Haiyun He, Yepeng Liu, Ziqiao Wang, Yongyi Mao, Yuheng Bu

TL;DR
This paper proposes a new framework for multi-bit watermarking in large language models by controlling token distributions, analyzing the trade-offs among quality, detectability, and information rate, and identifying optimal schemes.
Contribution
It introduces the concept of distributional information embedding for LLM watermarking and develops an information-theoretic analysis to optimize watermarking schemes.
Findings
Maximum achievable rate equals the entropy of the output distribution.
Rate increases with higher allowable distortion.
Optimal watermarking schemes are characterized for both asymptotic and finite-token cases.
Abstract
This paper introduces a novel problem, distributional information embedding, motivated by the practical demands of multi-bit watermarking for large language models (LLMs). Unlike traditional information embedding, which embeds information into a pre-existing host signal, LLM watermarking actively controls the text generation process--adjusting the token distribution--to embed a detectable signal. We develop an information-theoretic framework to analyze this distributional information embedding problem, characterizing the fundamental trade-offs among three critical performance metrics: text quality, detectability, and information rate. In the asymptotic regime, we demonstrate that the maximum achievable rate with vanishing error corresponds to the entropy of the LLM's output distribution and increases with higher allowable distortion. We also characterize the optimal watermarking scheme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Advanced Data Compression Techniques · Video Coding and Compression Technologies
