Mark My Words: Analyzing and Evaluating Language Model Watermarks
Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, David Wagner

TL;DR
This paper introduces 'Mark My Words,' a benchmark for evaluating language model watermarks across various tasks, focusing on quality, detectability, and tamper resistance, highlighting current techniques' practicality and limitations.
Contribution
It provides a systematic evaluation framework and benchmark for language model watermarking techniques, facilitating comparison and assessment of their effectiveness.
Findings
Watermarking techniques can watermark models like Llama 2 7B-chat with minimal quality loss.
Detection of watermarks can be achieved with fewer than 100 tokens.
Current watermarking schemes resist simple perturbations but struggle with code generation watermarking.
Abstract
The capabilities of large language models have grown significantly in recent years and so too have concerns about their misuse. It is important to be able to distinguish machine-generated text from human-authored content. Prior works have proposed numerous schemes to watermark text, which would benefit from a systematic evaluation framework. This work focuses on LLM output watermarking techniques - as opposed to image or model watermarks - and proposes Mark My Words, a comprehensive benchmark for them under different natural language tasks. We focus on three main metrics: quality, size (i.e., the number of tokens needed to detect a watermark), and tamper resistance (i.e., the ability to detect a watermark after perturbing marked text). Current watermarking techniques are nearly practical enough for real-world use: Kirchenbauer et al. [33]'s scheme can watermark models like Llama 2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Internet Traffic Analysis and Secure E-voting · Hate Speech and Cyberbullying Detection
MethodsLLaMA · Focus
