An Unforgeable Publicly Verifiable Watermark for Large Language Models

Aiwei Liu; Leyi Pan; Xuming Hu; Shu'ang Li; Lijie Wen; Irwin King and; Philip S. Yu

arXiv:2307.16230·cs.CL·May 28, 2024·5 cites

An Unforgeable Publicly Verifiable Watermark for Large Language Models

Aiwei Liu, Leyi Pan, Xuming Hu, Shu'ang Li, Lijie Wen, Irwin King and, Philip S. Yu

PDF

Open Access 3 Repos 3 Reviews

TL;DR

This paper introduces UPV, an unforgeable, publicly verifiable watermarking method for large language models that ensures high detection accuracy and security against forgery, using separate neural networks for generation and detection.

Contribution

The paper presents a novel watermarking algorithm that uses two different neural networks for generation and detection, enhancing security and efficiency over existing methods.

Findings

01

High detection accuracy achieved

02

Efficient neural network-based detection

03

Watermark forgery complexity confirmed

Abstract

Recently, text watermarking algorithms for large language models (LLMs) have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. LLM watermarking is a very important problem. 2. The proposed method can handle the security breaches and counterfeiting problems of existing LLM watermarking methods. 3. The proposed method is effective according to the reported results.

Weaknesses

1. Please correct me if I am wrong. It seems that there is no comparison with baseline methods in terms of both watermarking effectiveness and text generation quality. 2. It is not clear whether text edit methods like paraphrase can make the proposed method invalid.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

* The experimentation section is thorough, although it could be explained better * The authors tackle an important and timely problem of watermarking language models. * The idea of splitting the generation and verification watermarking key is interesting

Weaknesses

**Unclear Definition of Private Watermarking** Some statements are confusing. For example, on p1, the authors write the following. > However, current watermarking algorithms are all public, which means the detection of watermarks requires the key from the watermark generation process. But then, the authors state that any watermarking method can be made private by limiting who has access to this detection key (also p1). > Although Kirchenbauer et al. (2023) have suggested that the watermark

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

The paper has the following strengths: + The authors propose a new private watermarking scheme that disentangles watermark generation and watermark detection. This addresses the privacy concern of requiring the secret key in watermark detection. + Empirical results show that it's difficult to reverse watermark generation from watermark detection, and also the proposed detection method achieves high detection rate.

Weaknesses

The paper has the following weaknesses: - The contribution of the private watermarking scheme is not clearly justified. If the designer wants to protect the secret key, he can use a public key encryption scheme to design the watermarking scheme. Particularly, they can use a secret key in watermark generation, and a public key for watermark detection. It's not clear why the designer has to use two neural networks for watermark generation/detection. - There is no clear discussion about how water

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Internet Traffic Analysis and Secure E-voting