Advancing Beyond Identification: Multi-bit Watermark for Large Language Models
KiYoon Yoo, Wonhyuk Ahn, Nojun Kwak

TL;DR
This paper introduces a multi-bit watermarking technique for large language models that enables tracing and robustness without model access or finetuning, improving over existing zero-bit methods.
Contribution
It proposes Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during generation to enhance robustness, traceability, and efficiency.
Findings
Outperforms existing methods in robustness and latency.
Enables embedding and extraction of long messages without finetuning.
Maintains text quality while providing traceability and detection.
Abstract
We show the viability of tackling misuses of large language models beyond the identification of machine-generated text. While existing zero-bit watermark methods focus on detection only, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation. Through allocating tokens onto different parts of the messages, we embed longer messages in high corruption settings without added latency. By independently embedding sub-units of messages, the proposed method outperforms the existing works in terms of robustness and latency. Leveraging the benefits of zero-bit watermarking, our method enables robust extraction of the watermark without any model access, embedding and extraction of long messages ( 32-bit) without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning
MethodsFocus
