Watermarking Language Models through Language Models
Agnibh Dasgupta, Abdullah Tanvir, Xin Zhong

TL;DR
This paper presents a prompt-guided, input-level watermarking framework for large language models that does not require access to model internals, enabling dynamic, robust, and architecture-agnostic content marking and detection.
Contribution
The authors introduce a novel modular prompt-guided watermarking framework that operates without internal model access, adaptable to various LLM architectures and resistant to attacks.
Findings
Watermark signals generalize across different architectures.
Watermarks remain robust under fine-tuning and distillation.
The framework effectively detects watermarked outputs in diverse scenarios.
Abstract
Watermarking the outputs of large language models (LLMs) is critical for provenance tracing, content regulation, and model accountability. Existing approaches often rely on access to model internals or are constrained by static rules and token-level perturbations. Moreover, the idea of steering generative behavior via prompt-based instruction control remains largely underexplored. We introduce a prompt-guided watermarking framework that operates entirely at the input level and requires no access to model parameters or decoding logits. The framework comprises three cooperating components: a Prompting LM that synthesizes watermarking instructions from user prompts, a Marking LM that generates watermarked outputs conditioned on these instructions, and a Detecting LM trained to classify whether a response carries an embedded watermark. This modular design enables dynamic watermarking that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · User Authentication and Security Systems · Internet Traffic Analysis and Secure E-voting
