Can Watermarks Survive Translation? On the Cross-lingual Consistency of   Text Watermark for Large Language Models

Zhiwei He; Binglin Zhou; Hongkun Hao; Aiwei Liu; Xing Wang; Zhaopeng; Tu; Zhuosheng Zhang; Rui Wang

arXiv:2402.14007·cs.CL·June 5, 2024·3 cites

Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models

Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng, Tu, Zhuosheng Zhang, Rui Wang

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether text watermarks for LLMs remain effective after translation into other languages, revealing current limitations and proposing a new attack and defense method to address cross-lingual watermark robustness.

Contribution

It introduces the concept of cross-lingual consistency in text watermarking, demonstrates its vulnerability, and proposes X-SIR as a defense against watermark removal attacks.

Findings

01

Current text watermarking lacks cross-lingual consistency.

02

CWRA effectively removes watermarks without performance loss.

03

X-SIR improves robustness against watermark removal.

Abstract

Text watermarking technology aims to tag and identify content produced by large language models (LLMs) to prevent misuse. In this study, we introduce the concept of cross-lingual consistency in text watermarking, which assesses the ability of text watermarks to maintain their effectiveness after being translated into other languages. Preliminary empirical results from two LLMs and three watermarking methods reveal that current text watermarking technologies lack consistency when texts are translated into various languages. Based on this observation, we propose a Cross-lingual Watermark Removal Attack (CWRA) to bypass watermarking by first obtaining a response from an LLM in a pivot language, which is then translated into the target language. CWRA can effectively remove watermarks, decreasing the AUCs to a random-guessing level without performance loss. Furthermore, we analyze two key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zwhe99/x-sir
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Handwritten Text Recognition Techniques