Multi-Bit Distortion-Free Watermarking for Large Language Models
Massieh Kordi Boroujeny, Ya Jiang, Kai Zeng, Brian Mark

TL;DR
This paper introduces a multi-bit, distortion-free watermarking technique for large language models that embeds additional information with high efficiency and low error, improving upon previous zero-bit methods.
Contribution
It extends existing zero-bit distortion-free watermarking to embed multiple bits of meta-information and develops an efficient decoder with low bit error rate.
Findings
Successfully embeds multiple bits of information in watermarked text.
Achieves low bit error rate with computationally efficient decoding.
Maintains text quality without distortion, resisting adversarial detection.
Abstract
Methods for watermarking large language models have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection. More recently, distortion-free watermarking methods were proposed that require a secret key to detect the watermark. The prior methods generally embed zero-bit watermarks that do not provide additional information beyond tagging a text as being AI-generated. We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis
