Let Watermarks Speak: A Robust and Unforgeable Watermark for Language Models
Minhao Bai

TL;DR
This paper introduces a novel, robust, and unforgeable single-bit watermarking scheme for language models that can embed multiple watermark signals, enhancing content traceability and integrity verification.
Contribution
First to propose an undetectable, robust, single-bit watermarking scheme capable of embedding two different signals for language models.
Findings
Achieves comparable robustness to advanced zero-bit schemes.
Constructs a multi-bit scheme using prompt hash or generated content as watermark signals.
Demonstrates practical effectiveness and robustness through experiments.
Abstract
Watermarking is an effective way to trace model-generated content. Current watermark methods cannot resist forgery attacks, such as a deceptive claim that the model-generated content is a response to a fabricated prompt. None of them can be made unforgeable without degrading robustness. Unforgeability demands that the watermarked output is not only detectable but also verifiable for integrity, indicating whether it has been modified. This underscores the necessity and significance of a multi-bit watermarking scheme. Recent works try to build multi-bit scheme based on existing zero-bit watermarking scheme, but they either degrades the robustness or brings a significant computational burden. We aim to design a novel single-bit watermark scheme, which provides the ability to embed 2 different watermark signals. This paper's main contribution is that we are the first to propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Steganography and Watermarking Techniques · Music and Audio Processing
