Stylometric Watermarks for Large Language Models
Georg Niess, Roman Kern

TL;DR
This paper introduces a novel stylometric watermarking technique for large language models that uses linguistic features and semantic classification to reliably identify machine-generated text with minimal false positives.
Contribution
It presents a new watermarking method employing stylometry and semantic zero shot classification, improving robustness against attacks and aiding accountability of LLMs.
Findings
False positive and false negative rate of 0.02 for three or more sentences
Effective against cyclic translation attacks for seven or more sentences
Enhances LLM accountability and societal safety
Abstract
The rapid advancement of large language models (LLMs) has made it increasingly difficult to distinguish between text written by humans and machines. Addressing this, we propose a novel method for generating watermarks that strategically alters token probabilities during generation. Unlike previous works, this method uniquely employs linguistic features such as stylometry. Concretely, we introduce acrostica and sensorimotor norms to LLMs. Further, these features are parameterized by a key, which is updated every sentence. To compute this key, we use semantic zero shot classification, which enhances resilience. In our evaluation, we find that for three or more sentences, our method achieves a false positive and false negative rate of 0.02. For the case of a cyclic translation attack, we observe similar results for seven or more sentences. This research is of particular of interest for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Natural Language Processing Techniques
