Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models

Tom\'a\v{s} Sou\v{c}ek; Sylvestre-Alvise Rebuffi; Pierre Fernandez; Nikola Jovanovi\'c; Hady Elsahar; Valeriu Lacatusu; Tuan Tran; Alexandre Mourachko

arXiv:2510.20468·cs.LG·October 24, 2025

Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models

Tom\'a\v{s} Sou\v{c}ek, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanovi\'c, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko

PDF

Open Access

TL;DR

This paper presents a black-box, one-shot method for forging watermarks in images using a preference model trained without real watermarks, demonstrating the vulnerability of current watermarking techniques.

Contribution

Introduces a preference model for watermark detection and a practical attack method that forges watermarks using only a single watermarked image without model knowledge.

Findings

01

Effective watermark forging across multiple models

02

Requires only one watermarked image for attack

03

Questions the robustness of existing watermarking methods

Abstract

Recent years have seen a surge in interest in digital content watermarking techniques, driven by the proliferation of generative models and increased legal pressure. With an ever-growing percentage of AI-generated content available online, watermarking plays an increasingly important role in ensuring content authenticity and attribution at scale. There have been many works assessing the robustness of watermarking to removal attacks, yet, watermark forging, the scenario when a watermark is stolen from genuine content and applied to malicious content, remains underexplored. In this work, we investigate watermark forging in the context of widely used post-hoc image watermarking. Our contributions are as follows. First, we introduce a preference model to assess whether an image is watermarked. The model is trained using a ranking loss on purely procedurally generated images without any need…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis