Mark My Words: Analyzing and Evaluating Language Model Watermarks

Julien Piet; Chawin Sitawarin; Vivian Fang; Norman Mu; David Wagner

arXiv:2312.00273·cs.CR·October 15, 2024·2 cites

Mark My Words: Analyzing and Evaluating Language Model Watermarks

Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, David Wagner

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces 'Mark My Words,' a benchmark for evaluating language model watermarks across various tasks, focusing on quality, detectability, and tamper resistance, highlighting current techniques' practicality and limitations.

Contribution

It provides a systematic evaluation framework and benchmark for language model watermarking techniques, facilitating comparison and assessment of their effectiveness.

Findings

01

Watermarking techniques can watermark models like Llama 2 7B-chat with minimal quality loss.

02

Detection of watermarks can be achieved with fewer than 100 tokens.

03

Current watermarking schemes resist simple perturbations but struggle with code generation watermarking.

Abstract

The capabilities of large language models have grown significantly in recent years and so too have concerns about their misuse. It is important to be able to distinguish machine-generated text from human-authored content. Prior works have proposed numerous schemes to watermark text, which would benefit from a systematic evaluation framework. This work focuses on LLM output watermarking techniques - as opposed to image or model watermarks - and proposes Mark My Words, a comprehensive benchmark for them under different natural language tasks. We focus on three main metrics: quality, size (i.e., the number of tokens needed to detect a watermark), and tamper resistance (i.e., the ability to detect a watermark after perturbing marked text). Current watermarking techniques are nearly practical enough for real-world use: Kirchenbauer et al. [33]'s scheme can watermark models like Llama 2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wagner-group/markmywords
pytorchOfficial

Datasets

wagner-group/MarkMyWords-tasks
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Internet Traffic Analysis and Secure E-voting · Hate Speech and Cyberbullying Detection

MethodsLLaMA · Focus