Large Language Model Watermark Stealing With Mixed Integer Programming
Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, Leo Yu Zhang and, Chao Chen, Shengshan Hu, Asif Gill, Shirui Pan

TL;DR
This paper introduces a mixed integer programming-based attack that effectively removes watermarks from large language models, exposing vulnerabilities in current watermarking schemes even under minimal attacker knowledge.
Contribution
The paper presents a novel green list stealing attack formulated as a mixed integer programming problem, demonstrating its effectiveness against state-of-the-art LLM watermarking methods.
Findings
Attack successfully steals green list and removes watermark in experiments
Effective even with no prior knowledge of watermark scheme
Vulnerabilities exist in current watermarking approaches
Abstract
The Large Language Model (LLM) watermark is a newly emerging technique that shows promise in addressing concerns surrounding LLM copyright, monitoring AI-generated text, and preventing its misuse. The LLM watermark scheme commonly includes generating secret keys to partition the vocabulary into green and red lists, applying a perturbation to the logits of tokens in the green list to increase their sampling likelihood, thus facilitating watermark detection to identify AI-generated text if the proportion of green tokens exceeds a threshold. However, recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks, such as token editing, synonym substitution, and paraphrasing, with robustness declining as the number of keys increases. Therefore, the state-of-the-art watermark schemes that employ fewer or single keys have been demonstrated to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Vehicle License Plate Recognition
MethodsOPT · LLaMA
