Ensemble Watermarks for Large Language Models

Georg Niess; Roman Kern

arXiv:2411.19563·cs.CL·June 18, 2025

Ensemble Watermarks for Large Language Models

Georg Niess, Roman Kern

PDF

Open Access 1 Repo

TL;DR

This paper introduces an ensemble watermarking technique for large language models that combines multiple features to improve detection accuracy and robustness against paraphrasing attacks, enhancing accountability.

Contribution

It proposes a novel multi-feature ensemble watermarking method that significantly improves detection rates and robustness over existing single-feature watermarks for LLMs.

Findings

01

98% detection rate with ensemble watermark

02

95% detection rate after paraphrasing attack

03

Superior performance compared to baseline watermarks

Abstract

As large language models (LLMs) reach human-like fluency, reliably distinguishing AI-generated text from human authorship becomes increasingly difficult. While watermarks already exist for LLMs, they often lack flexibility and struggle with attacks such as paraphrasing. To address these issues, we propose a multi-feature method for generating watermarks that combines multiple distinct watermark features into an ensemble watermark. Concretely, we combine acrostica and sensorimotor norms with the established red-green watermark to achieve a 98% detection rate. After a paraphrasing attack, the performance remains high with 95% detection rate. In comparison, the red-green feature alone as a baseline achieves a detection rate of 49% after paraphrasing. The evaluation of all feature combinations reveals that the ensemble of all three consistently has the highest detection rate across several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

commodoreeu/master-generation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques