StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models
Ya Jiang, Chuxiong Wu, Massieh Kordi Boroujeny, Brian Mark, Kai Zeng

TL;DR
StealthInk introduces a novel multi-bit watermarking method for large language models that maintains text quality, embeds provenance data, and allows fast, reliable detection without API access.
Contribution
It presents a stealthy, multi-bit watermarking scheme that preserves original text distribution and enhances traceability for LLMs, surpassing existing zero-bit methods.
Findings
High detectability and resilience demonstrated across tasks
Preserves original text distribution effectively
Enables embedding of multiple bits of information
Abstract
Watermarking for large language models (LLMs) offers a promising approach to identifying AI-generated text. Existing approaches, however, either compromise the distribution of original generated text by LLMs or are limited to embedding zero-bit information that only allows for watermark detection but ignores identification. We present StealthInk, a stealthy multi-bit watermarking scheme that preserves the original text distribution while enabling the embedding of provenance data, such as userID, TimeStamp, and modelID, within LLM-generated text. This enhances fast traceability without requiring access to the language model's API or prompts. We derive a lower bound on the number of tokens necessary for watermark detection at a fixed equal error rate, which provides insights on how to enhance the capacity. Comprehensive empirical evaluations across diverse tasks highlight the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Scientific Computing and Data Management · Generative Adversarial Networks and Image Synthesis
