How Good is Post-Hoc Watermarking With Language Model Rephrasing?
Pierre Fernandez, Tom Sander, Hady Elsahar, Hongyan Chang, Tom\'a\v{s} Sou\v{c}ek, Valeriu Lacatusu, Tuan Tran, Sylvestre-Alvise Rebuffi, Alexandre Mourachko

TL;DR
This paper investigates the effectiveness of post-hoc watermarking in language models, analyzing how different strategies impact detectability and text quality, revealing strengths in open-ended text and challenges with verifiable content.
Contribution
It introduces a comprehensive analysis of post-hoc watermarking techniques, comparing various methods and uncovering their strengths and limitations across different text types.
Findings
Gumbel-max scheme outperforms recent alternatives under nucleus sampling.
Beam search significantly improves detectability and semantic fidelity.
Smaller models outperform larger ones in watermarking verifiable text like code.
Abstract
Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Steganography and Watermarking Techniques
