Large Language Model Watermark Stealing With Mixed Integer Programming

Zhaoxi Zhang; Xiaomei Zhang; Yanjun Zhang; Leo Yu Zhang and; Chao Chen; Shengshan Hu; Asif Gill; Shirui Pan

arXiv:2405.19677·cs.CR·May 31, 2024·1 cites

Large Language Model Watermark Stealing With Mixed Integer Programming

Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, Leo Yu Zhang and, Chao Chen, Shengshan Hu, Asif Gill, Shirui Pan

PDF

Open Access

TL;DR

This paper introduces a mixed integer programming-based attack that effectively removes watermarks from large language models, exposing vulnerabilities in current watermarking schemes even under minimal attacker knowledge.

Contribution

The paper presents a novel green list stealing attack formulated as a mixed integer programming problem, demonstrating its effectiveness against state-of-the-art LLM watermarking methods.

Findings

01

Attack successfully steals green list and removes watermark in experiments

02

Effective even with no prior knowledge of watermark scheme

03

Vulnerabilities exist in current watermarking approaches

Abstract

The Large Language Model (LLM) watermark is a newly emerging technique that shows promise in addressing concerns surrounding LLM copyright, monitoring AI-generated text, and preventing its misuse. The LLM watermark scheme commonly includes generating secret keys to partition the vocabulary into green and red lists, applying a perturbation to the logits of tokens in the green list to increase their sampling likelihood, thus facilitating watermark detection to identify AI-generated text if the proportion of green tokens exceeds a threshold. However, recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks, such as token editing, synonym substitution, and paraphrasing, with robustness declining as the number of keys increases. Therefore, the state-of-the-art watermark schemes that employ fewer or single keys have been demonstrated to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Vehicle License Plate Recognition

MethodsOPT · LLaMA