Bypassing LLM Watermarks with Color-Aware Substitutions

Qilong Wu; Varun Chandrasekaran

arXiv:2403.14719·cs.CR·March 25, 2024·1 cites

Bypassing LLM Watermarks with Color-Aware Substitutions

Qilong Wu, Varun Chandrasekaran

PDF

Open Access 1 Video

TL;DR

This paper introduces SCTS, a novel color-aware attack that effectively evades LLM watermarks by strategically identifying and substituting watermarked tokens, outperforming previous methods in robustness and efficiency.

Contribution

We propose SCTS, the first color-aware attack that can reliably remove watermarks from long texts, demonstrating both theoretical and empirical effectiveness against state-of-the-art watermarking.

Findings

01

SCTS successfully evades watermark detection with fewer edits.

02

SCTS can remove watermarks from arbitrarily long texts.

03

Theoretical analysis confirms SCTS's effectiveness.

Abstract

Watermarking approaches are proposed to identify if text being circulated is human or large language model (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (``green'') tokens. However, determining the robustness of this watermarking method is an open problem. Existing attack methods fail to evade detection for longer text segments. We overcome this limitation, and propose {\em Self Color Testing-based Substitution (SCTS)}, the first ``color-aware'' attack. SCTS obtains color information by strategically prompting the watermarked LLM and comparing output tokens frequencies. It uses this information to determine token colors, and substitutes green tokens with non-green ones. In our experiments, SCTS successfully evades watermark detection using fewer number of edits than related work. Additionally, we show both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bypassing LLM Watermarks with Color-Aware Substitutions· underline

Taxonomy

TopicsHandwritten Text Recognition Techniques · Digital Rights Management and Security · Algorithms and Data Compression