An Unforgeable Publicly Verifiable Watermark for Large Language Models
Aiwei Liu, Leyi Pan, Xuming Hu, Shu'ang Li, Lijie Wen, Irwin King and, Philip S. Yu

TL;DR
This paper introduces UPV, an unforgeable, publicly verifiable watermarking method for large language models that ensures high detection accuracy and security against forgery, using separate neural networks for generation and detection.
Contribution
The paper presents a novel watermarking algorithm that uses two different neural networks for generation and detection, enhancing security and efficiency over existing methods.
Findings
High detection accuracy achieved
Efficient neural network-based detection
Watermark forgery complexity confirmed
Abstract
Recently, text watermarking algorithms for large language models (LLMs) have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational…
Peer Reviews
Decision·ICLR 2024 poster
1. LLM watermarking is a very important problem. 2. The proposed method can handle the security breaches and counterfeiting problems of existing LLM watermarking methods. 3. The proposed method is effective according to the reported results.
1. Please correct me if I am wrong. It seems that there is no comparison with baseline methods in terms of both watermarking effectiveness and text generation quality. 2. It is not clear whether text edit methods like paraphrase can make the proposed method invalid.
* The experimentation section is thorough, although it could be explained better * The authors tackle an important and timely problem of watermarking language models. * The idea of splitting the generation and verification watermarking key is interesting
**Unclear Definition of Private Watermarking** Some statements are confusing. For example, on p1, the authors write the following. > However, current watermarking algorithms are all public, which means the detection of watermarks requires the key from the watermark generation process. But then, the authors state that any watermarking method can be made private by limiting who has access to this detection key (also p1). > Although Kirchenbauer et al. (2023) have suggested that the watermark
The paper has the following strengths: + The authors propose a new private watermarking scheme that disentangles watermark generation and watermark detection. This addresses the privacy concern of requiring the secret key in watermark detection. + Empirical results show that it's difficult to reverse watermark generation from watermark detection, and also the proposed detection method achieves high detection rate.
The paper has the following weaknesses: - The contribution of the private watermarking scheme is not clearly justified. If the designer wants to protect the secret key, he can use a public key encryption scheme to design the watermarking scheme. Particularly, they can use a secret key in watermark generation, and a public key for watermark detection. It's not clear why the designer has to use two neural networks for watermark generation/detection. - There is no clear discussion about how water
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Internet Traffic Analysis and Secure E-voting
