Revisiting the Robustness of Watermarking to Paraphrasing Attacks

Saksham Rastogi; Danish Pruthi

arXiv:2411.05277·cs.CR·November 11, 2024·2 cites

Revisiting the Robustness of Watermarking to Paraphrasing Attacks

Saksham Rastogi, Danish Pruthi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper critically examines the robustness of text watermarking techniques against paraphrasing attacks, revealing that limited access to watermarked outputs enables effective evasion, thus challenging their reliability.

Contribution

The study demonstrates that existing watermarking schemes can be easily circumvented with limited black-box access, highlighting vulnerabilities and the need for more robust methods.

Findings

01

Limited paraphrasing attacks can effectively evade watermark detection.

02

Current watermarking methods are vulnerable to reverse-engineering.

03

Robustness claims of some watermarking techniques are overstated.

Abstract

Amidst rising concerns about the internet being proliferated with content generated from language models (LMs), watermarking is seen as a principled way to certify whether text was generated from a model. Many recent watermarking techniques slightly modify the output probabilities of LMs to embed a signal in the generated output that can later be detected. Since early proposals for text watermarking, questions about their robustness to paraphrasing have been prominently discussed. Lately, some techniques are deliberately designed and claimed to be robust to paraphrasing. However, such watermarking schemes do not adequately account for the ease with which they can be reverse-engineered. We show that with access to only a limited number of generations from a black-box watermarked model, we can drastically increase the effectiveness of paraphrasing attacks to evade watermark detection,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

codeboy5/revisiting-watermark-robustness
pytorchOfficial

Videos

Revisiting the Robustness of Watermarking to Paraphrasing Attacks· underline

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Internet Traffic Analysis and Secure E-voting · Digital Media Forensic Detection