Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas

TL;DR
This paper introduces a provably forgery-resistant watermarking scheme for generative models that randomizes key selection per query, significantly reducing attack success rates without harming model utility.
Contribution
The authors propose a novel randomized key selection method that guarantees forgery resistance regardless of the number of watermarked samples collected by attackers.
Findings
Attack success rate reduced from near 100% to 2%.
Method does not degrade model utility.
Provably bounds attacker success rate.
Abstract
Watermarking enables GenAI providers to verify whether content was generated by their models. A watermark is a hidden signal in the content, whose presence can be detected using a secret watermark key. A core security threat are forgery attacks, where adversaries insert the provider's watermark into content \emph{not} produced by the provider, potentially damaging their reputation and undermining trust. Existing defenses resist forgery by embedding many watermarks with multiple keys into the same content, which can degrade model utility. However, forgery remains a threat when attackers can collect sufficiently many watermarked samples. We propose a defense that is provably forgery-resistant \emph{independent} of the number of watermarked content collected by the attacker, provided they cannot easily distinguish watermarks from different keys. Our scheme does not further degrade model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
