Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models
Tom\'a\v{s} Sou\v{c}ek, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanovi\'c, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko

TL;DR
This paper presents a black-box, one-shot method for forging watermarks in images using a preference model trained without real watermarks, demonstrating the vulnerability of current watermarking techniques.
Contribution
Introduces a preference model for watermark detection and a practical attack method that forges watermarks using only a single watermarked image without model knowledge.
Findings
Effective watermark forging across multiple models
Requires only one watermarked image for attack
Questions the robustness of existing watermarking methods
Abstract
Recent years have seen a surge in interest in digital content watermarking techniques, driven by the proliferation of generative models and increased legal pressure. With an ever-growing percentage of AI-generated content available online, watermarking plays an increasingly important role in ensuring content authenticity and attribution at scale. There have been many works assessing the robustness of watermarking to removal attacks, yet, watermark forging, the scenario when a watermark is stolen from genuine content and applied to malicious content, remains underexplored. In this work, we investigate watermark forging in the context of widely used post-hoc image watermarking. Our contributions are as follows. First, we introduce a preference model to assess whether an image is watermarked. The model is trained using a ranking loss on purely procedurally generated images without any need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
