A Nested Watermark for Large Language Models

Koichi Nagatsuka; Terufumi Morishita; Yasuhiro Sogawa

arXiv:2506.17308·cs.CR·June 24, 2025

A Nested Watermark for Large Language Models

Koichi Nagatsuka, Terufumi Morishita, Yasuhiro Sogawa

PDF

TL;DR

This paper introduces a nested watermarking technique for large language models that embeds two independent watermarks, enhancing traceability and robustness against key leaks without compromising text quality.

Contribution

The paper proposes a novel nested watermarking scheme with two independent keys, improving authorship attribution and robustness over existing single-key methods.

Findings

01

High detection accuracy for both watermarks

02

Maintains text fluency and quality

03

Robust against key leakage scenarios

Abstract

The rapid advancement of large language models (LLMs) has raised concerns regarding their potential misuse, particularly in generating fake news and misinformation. To address these risks, watermarking techniques for autoregressive language models have emerged as a promising means for detecting LLM-generated text. Existing methods typically embed a watermark by increasing the probabilities of tokens within a group selected according to a single secret key. However, this approach suffers from a critical limitation: if the key is leaked, it becomes impossible to trace the text's provenance or attribute authorship. To overcome this vulnerability, we propose a novel nested watermarking scheme that embeds two distinct watermarks into the generated text using two independent keys. This design enables reliable authorship identification even in the event that one key is compromised.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.