Watermark under Fire: A Robustness Evaluation of LLM Watermarking
Jiacheng Liang, Zian Wang, Lauren Hong, Shouling Ji, Ting Wang

TL;DR
This paper introduces WaterPark, a unified platform for evaluating the robustness of LLM watermarking methods against attacks, providing comprehensive insights into their strengths, limitations, and optimal usage in adversarial settings.
Contribution
It systematically analyzes existing watermarking techniques and develops WaterPark, enabling standardized evaluation and revealing factors affecting robustness.
Findings
Watermarking design choices significantly impact attack robustness.
WaterPark effectively benchmarks 10 watermarking methods and 12 attacks.
Best practices for operating watermarkers in adversarial environments are identified.
Abstract
Various watermarking methods (``watermarkers'') have been proposed to identify LLM-generated texts; yet, due to the lack of unified evaluation platforms, many critical questions remain under-explored: i) What are the strengths/limitations of various watermarkers, especially their attack robustness? ii) How do various design choices impact their robustness? iii) How to optimally operate watermarkers in adversarial environments? To fill this gap, we systematize existing LLM watermarkers and watermark removal attacks, mapping out their design spaces. We then develop WaterPark, a unified platform that integrates 10 state-of-the-art watermarkers and 12 representative attacks. More importantly, by leveraging WaterPark, we conduct a comprehensive assessment of existing watermarkers, unveiling the impact of various design choices on their attack robustness. We further explore the best practices…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Digital and Cyber Forensics · Internet Traffic Analysis and Secure E-voting
