Leveraging Optimization for Adaptive Attacks on Image Watermarks

Nils Lukas; Abdulrahman Diaa; Lucas Fenaux; Florian Kerschbaum

arXiv:2309.16952·cs.CR·January 23, 2024

Leveraging Optimization for Adaptive Attacks on Image Watermarks

Nils Lukas, Abdulrahman Diaa, Lucas Fenaux, Florian Kerschbaum

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces an optimization-based framework for adaptive attacks on image watermarking, revealing vulnerabilities in current methods and highlighting the need for more rigorous robustness testing.

Contribution

It formulates adaptive watermark attacks as an optimization problem using surrogate keys, demonstrating their effectiveness against multiple watermarking methods.

Findings

01

All surveyed watermarking methods can be broken with minimal image quality loss.

02

Optimized attacks require less than 1 GPU hour to significantly reduce detection accuracy.

03

The study underscores the importance of rigorous robustness evaluation against adaptive attackers.

Abstract

Untrustworthy users can misuse image generators to synthesize high-quality deepfakes and engage in unethical activities. Watermarking deters misuse by marking generated content with a hidden message, enabling its detection using a secret watermarking key. A core security property of watermarking is robustness, which states that an attacker can only evade detection by substantially degrading image quality. Assessing robustness requires designing an adaptive attack for the specific watermarking algorithm. When evaluating watermarking algorithms and their (adaptive) attacks, it is challenging to determine whether an adaptive attack is optimal, i.e., the best possible attack. We solve this problem by defining an objective function and then approach adaptive attacks as an optimization problem. The core idea of our adaptive attacks is to replicate secret watermarking keys locally by creating…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

Paper is well formatted Topic is interesting Good balance of theory and experiments

Weaknesses

Please improve readability Please number all equations Please discuss figures, tables and algorithms clearly in the text Please add a security analysis to known attacks in this domain

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

First, on the aspect of the paper’s organization, this manuscript is well-organized and easy to follow. Second, on the aspect of clarity, the proposed method is clearly defined using schematics and pseudo-code descriptions. Third, this paper provides an approach to evaluating adaptive attacks and the demonstration of their effectiveness provide a fresh perspective on the challenges faced in countering image manipulation.

Weaknesses

The motivation and importance of the proposed method are not clear enough, e.g., what problems did the previous works exist? Besides, the experiments comparison and discussion are weak. Experiment section should expand the scope of discussion, compare with more advanced methods, and provide in-depth discussions.

Reviewer 03Rating 8· accept, good paperConfidence 5

Strengths

The issue of watermarking the outputs of generative models is timely and interesting. The idea of training differentiable surrogates for arbitrary watermarking methods is an interesting threat model. The selection of baseline watermarking methods is reasonable and includes both "post-hoc" (low-perturbation) and "semantic" (high-perturbation) methods. The autoencoder/compression-based attack is interesting and seems to effectively remove watermarks while retaining high perceptual quality.

Weaknesses

I think there is a terminology issue in the paper that could be confusing for readers. It appears the watermark "key" referenced in the paper more closely matches the concept of a watermark "detector" algorithm in methods such as RivaGAN and Tree-Rings; many methods often use "key" and "message" interchangeably to refer to the hidden signal. If this is true, the authors' proposed training of differentiable surrogate "keys" can be understood as training differentiable surrogate detector networks

Code & Models

Repositories

nilslukas/adaptive-watermark-attacks
pytorchOfficial

Videos

Leveraging Optimization for Adaptive Attacks on Image Watermarks· slideslive

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning

MethodsDiffusion