On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks
Zesen Liu, Tianshuo Cong, Xinlei He, Qi Li

TL;DR
This paper systematically evaluates the robustness of various watermarking schemes for machine-generated texts against different attacks, revealing vulnerabilities and emphasizing the need for more resilient solutions.
Contribution
It categorizes watermarking schemes and attacks, conducts extensive experiments, and provides insights into their robustness and imperceptibility, highlighting areas for improvement.
Findings
Post-text attacks are more effective than pre-text attacks.
Pre-text watermarks are more imperceptible and maintain text quality.
Current watermarking schemes are vulnerable to combined attacks.
Abstract
Large Language Models (LLMs) excel in various applications, including text generation and complex tasks. However, the misuse of LLMs raises concerns about the authenticity and ethical implications of the content they produce, such as deepfake news, academic fraud, and copyright infringement. Watermarking techniques, which embed identifiable markers in machine-generated text, offer a promising solution to these issues by allowing for content verification and origin tracing. Unfortunately, the robustness of current LLM watermarking schemes under potential watermark removal attacks has not been comprehensively explored. In this paper, to fill this gap, we first systematically comb the mainstream watermarking schemes and removal attacks on machine-generated texts, and then we categorize them into pre-text (before text generation) and post-text (after text generation) classes so that we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Network Security and Intrusion Detection
