Cross-Lingual Summarization as a Black-Box Watermark Removal Attack
Gokul Ganesan

TL;DR
This paper introduces cross-lingual summarization attacks that effectively remove AI-generated text watermarks by translating and summarizing across languages, revealing a significant vulnerability in current watermarking methods.
Contribution
It demonstrates that cross-lingual summarization attacks outperform paraphrasing in destroying watermark signals while maintaining semantic fidelity, exposing a new security challenge.
Findings
CLSA reduces watermark detection accuracy to near chance levels.
CLSA preserves semantic content and task utility.
Cross-lingual attacks outperform monolingual paraphrasing in watermark removal.
Abstract
Watermarking has been proposed as a lightweight mechanism to identify AI-generated text, with schemes typically relying on perturbations to token distributions. While prior work shows that paraphrasing can weaken such signals, these attacks remain partially detectable or degrade text quality. We demonstrate that cross-lingual summarization attacks (CLSA) -- translation to a pivot language followed by summarization and optional back-translation -- constitute a qualitatively stronger attack vector. By forcing a semantic bottleneck across languages, CLSA systematically destroys token-level statistical biases while preserving semantic fidelity. In experiments across multiple watermarking schemes (KGW, SIR, XSIR, Unigram) and five languages (Amharic, Chinese, Hindi, Spanish, Swahili), we show that CLSA reduces watermark detection accuracy more effectively than monolingual paraphrase at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
