Stylometric Watermarks for Large Language Models

Georg Niess; Roman Kern

arXiv:2405.08400·cs.CL·May 15, 2024

Stylometric Watermarks for Large Language Models

Georg Niess, Roman Kern

PDF

Open Access

TL;DR

This paper introduces a novel stylometric watermarking technique for large language models that uses linguistic features and semantic classification to reliably identify machine-generated text with minimal false positives.

Contribution

It presents a new watermarking method employing stylometry and semantic zero shot classification, improving robustness against attacks and aiding accountability of LLMs.

Findings

01

False positive and false negative rate of 0.02 for three or more sentences

02

Effective against cyclic translation attacks for seven or more sentences

03

Enhances LLM accountability and societal safety

Abstract

The rapid advancement of large language models (LLMs) has made it increasingly difficult to distinguish between text written by humans and machines. Addressing this, we propose a novel method for generating watermarks that strategically alters token probabilities during generation. Unlike previous works, this method uniquely employs linguistic features such as stylometry. Concretely, we introduce acrostica and sensorimotor norms to LLMs. Further, these features are parameterized by a key, which is updated every sentence. To compute this key, we use semantic zero shot classification, which enhances resilience. In our evaluation, we find that for three or more sentences, our method achieves a false positive and false negative rate of 0.02. For the case of a cyclic translation attack, we observe similar results for seven or more sentences. This research is of particular of interest for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Natural Language Processing Techniques