Watermarking Language Models through Language Models

Agnibh Dasgupta; Abdullah Tanvir; Xin Zhong

arXiv:2411.05091·cs.LG·June 23, 2025

Watermarking Language Models through Language Models

Agnibh Dasgupta, Abdullah Tanvir, Xin Zhong

PDF

Open Access

TL;DR

This paper presents a prompt-guided, input-level watermarking framework for large language models that does not require access to model internals, enabling dynamic, robust, and architecture-agnostic content marking and detection.

Contribution

The authors introduce a novel modular prompt-guided watermarking framework that operates without internal model access, adaptable to various LLM architectures and resistant to attacks.

Findings

01

Watermark signals generalize across different architectures.

02

Watermarks remain robust under fine-tuning and distillation.

03

The framework effectively detects watermarked outputs in diverse scenarios.

Abstract

Watermarking the outputs of large language models (LLMs) is critical for provenance tracing, content regulation, and model accountability. Existing approaches often rely on access to model internals or are constrained by static rules and token-level perturbations. Moreover, the idea of steering generative behavior via prompt-based instruction control remains largely underexplored. We introduce a prompt-guided watermarking framework that operates entirely at the input level and requires no access to model parameters or decoding logits. The framework comprises three cooperating components: a Prompting LM that synthesizes watermarking instructions from user prompts, a Marking LM that generates watermarked outputs conditioned on these instructions, and a Detecting LM trained to classify whether a response carries an embedded watermark. This modular design enables dynamic watermarking that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · User Authentication and Security Systems · Internet Traffic Analysis and Secure E-voting