MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
Yizhou Zhao, Zhiwei Steven Wu, Adam Block

TL;DR
MarkTune is a fine-tuning framework that enhances watermark detectability in open-weight language models while maintaining high text quality, addressing limitations of existing watermarking methods.
Contribution
It introduces a theoretically grounded, on-policy fine-tuning method that improves the quality-detectability trade-off for open-weight watermarking techniques like GaussMark.
Findings
MarkTune significantly improves watermark detection power without degrading text quality.
It achieves robustness against paraphrasing and fine-tuning attacks.
The method generalizes well across different datasets and models.
Abstract
Watermarking aims to embed hidden signals in generated text that can be reliably detected when given access to a secret key. Open-weight language models pose acute challenges for such watermarking schemes because the inference-time interventions that dominate contemporary approaches cannot be enforced once model weights are public. Existing watermaking techniques for open-weight models, such as the recently proposed GaussMark, typically rely on small modifications to model weights, which can yield signals detectable to those equipped with a secret key, but achieving detection power comparable to inference-time watermarks generally requires weight perturbations that noticeably reduce generation quality. We introduce MarkTune, a theoretically principled, on-policy fine-tuning framework that treats the GaussMark signal as a reward while simultaneously regularizing against degradation in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Steganography and Watermarking Techniques · Advanced Malware Detection Techniques
