Necessary and Sufficient Watermark for Large Language Models
Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada

TL;DR
This paper introduces the Necessary and Sufficient Watermark (NS-Watermark), a novel method for watermarking LLM-generated texts that maintains text quality while improving detection accuracy, especially in translation tasks.
Contribution
It formulates the minimal constraints for effective watermarking as a constrained optimization problem and provides an efficient algorithm to implement it.
Findings
NS-Watermark produces more natural texts than existing methods.
It achieves up to 30 BLEU score improvements in translation tasks.
It accurately distinguishes LLM texts from human texts without degrading quality.
Abstract
In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written by LLMs from those written by humans. Watermarking is one of the most powerful methods for achieving this. Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts. In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading the text quality. More specifically, we derive minimum constraints required to be imposed on…
Peer Reviews
Decision·Submitted to ICLR 2024
(1) The paper addresses a significant issue in the field of natural language processing, specifically in mitigating risks associated with the malicious use of large language models. (2) The authors propose a novel method, the Necessary and Sufficient Watermark (NS-Watermark), which is an innovative solution to the problem at hand. (3) The paper provides a comprehensive analysis of the proposed method, including a well-structured theoretical analysis and practical implementation details.
(1) It is overclaimed that the text quality is unaffected, compared to the no watermarked model. There is some quality drop compared to the unwatermarked model, but not too much. You can claim the text quality is better than Soft-Watermark. So using the phrase "without degrading the text quality" is not fully accurate. (2) The proposed model explores a less conservative region of z-scores compared to Soft-Watermark. Soft-Watermark's conservative approach provides robustness against attacks. And
1. The proposed method can preserve the quality of the generated text at a certain level. 2. The authors provide an approximation solution with linear time complexity 3. The authors conduct a series experiments to evaluate the effectiveness in terms of the text quality, the detection accuracy, and the sensitivity towards the hyper-parameters.
1. Regarding the methodology: 1.1 There needs to be a more in-depth discussion about the comparison between the proposed naive method and the linear-time approximation method. Since the linear-time method is an approximation method, it would be valuable to understand what it sacrifices in order to obtain the linear time complexity. 1.2 The figure 1's illustration is not very clear. What's the difference between the two sub-figures in figure 1(a)? 2. Regarding the experiment: 2.1 Lack the c
- This work models the watermarking problem as a constrained optimization problem and then solve it by combining dynamic programming and beam search. The authors also proposed an approximation method to reduce the complexity. - The proposed method demonstrates superior performance compared to the baselines - The paper is effectively structured and exhibits clear and concise writing
- Since NS-Watermark requires solving constrained optimization, although the authors did mention the running time in Appendix D.1, how expensive it is against baselines are missing. - The proposed algorithms depend on the beam search. However, in the experiment, the authors only use one beam size (k = 1). - It is not clear to me how NS-Watermark robust to attacks. For example, NS-Watermark might be removed by simply adding a list of red words?
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Natural Language Processing Techniques · Topic Modeling
