Leveraging Optimization for Adaptive Attacks on Image Watermarks
Nils Lukas, Abdulrahman Diaa, Lucas Fenaux, Florian Kerschbaum

TL;DR
This paper introduces an optimization-based framework for adaptive attacks on image watermarking, revealing vulnerabilities in current methods and highlighting the need for more rigorous robustness testing.
Contribution
It formulates adaptive watermark attacks as an optimization problem using surrogate keys, demonstrating their effectiveness against multiple watermarking methods.
Findings
All surveyed watermarking methods can be broken with minimal image quality loss.
Optimized attacks require less than 1 GPU hour to significantly reduce detection accuracy.
The study underscores the importance of rigorous robustness evaluation against adaptive attackers.
Abstract
Untrustworthy users can misuse image generators to synthesize high-quality deepfakes and engage in unethical activities. Watermarking deters misuse by marking generated content with a hidden message, enabling its detection using a secret watermarking key. A core security property of watermarking is robustness, which states that an attacker can only evade detection by substantially degrading image quality. Assessing robustness requires designing an adaptive attack for the specific watermarking algorithm. When evaluating watermarking algorithms and their (adaptive) attacks, it is challenging to determine whether an adaptive attack is optimal, i.e., the best possible attack. We solve this problem by defining an objective function and then approach adaptive attacks as an optimization problem. The core idea of our adaptive attacks is to replicate secret watermarking keys locally by creating…
Peer Reviews
Decision·ICLR 2024 poster
Paper is well formatted Topic is interesting Good balance of theory and experiments
Please improve readability Please number all equations Please discuss figures, tables and algorithms clearly in the text Please add a security analysis to known attacks in this domain
First, on the aspect of the paper’s organization, this manuscript is well-organized and easy to follow. Second, on the aspect of clarity, the proposed method is clearly defined using schematics and pseudo-code descriptions. Third, this paper provides an approach to evaluating adaptive attacks and the demonstration of their effectiveness provide a fresh perspective on the challenges faced in countering image manipulation.
The motivation and importance of the proposed method are not clear enough, e.g., what problems did the previous works exist? Besides, the experiments comparison and discussion are weak. Experiment section should expand the scope of discussion, compare with more advanced methods, and provide in-depth discussions.
The issue of watermarking the outputs of generative models is timely and interesting. The idea of training differentiable surrogates for arbitrary watermarking methods is an interesting threat model. The selection of baseline watermarking methods is reasonable and includes both "post-hoc" (low-perturbation) and "semantic" (high-perturbation) methods. The autoencoder/compression-based attack is interesting and seems to effectively remove watermarks while retaining high perceptual quality.
I think there is a terminology issue in the paper that could be confusing for readers. It appears the watermark "key" referenced in the paper more closely matches the concept of a watermark "detector" algorithm in methods such as RivaGAN and Tree-Rings; many methods often use "key" and "message" interchangeably to refer to the hidden signal. If this is true, the authors' proposed training of differentiable surrogate "keys" can be understood as training differentiable surrogate detector networks
Code & Models
Videos
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
MethodsDiffusion
