Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering
Haonan An, Guang Hua, Hangcheng Cao, Zhengru Fang, Guowen Xu, Susanto Rahardja, Yuguang Fang

TL;DR
This paper uncovers a vulnerability in box-free watermarking for deep generative networks, demonstrating that watermarked outputs can be reverse-engineered to remove watermarks with high success and image quality.
Contribution
The authors introduce query-based reverse engineering methods to effectively remove watermarks from black-box deep generative models, exposing a critical security flaw.
Findings
Achieved 100% watermark removal success rate.
Maintained high image quality with PSNR up to 34.69 dB.
Outperformed existing watermark removal attacks.
Abstract
The intellectual property of deep generative networks (GNets) can be protected using a cascaded hiding network (HNet) which embeds watermarks (or marks) into GNet outputs, known as box-free watermarking. Although both GNet and HNet are encapsulated in a black box (called operation network, or ONet), with only the generated and marked outputs from HNet being released to end users and deemed secure, in this paper, we reveal an overlooked vulnerability in such systems. Specifically, we show that the hidden GNet outputs can still be reliably estimated via query-based reverse engineering, leaking the generated and unmarked images, despite the attacker's limited knowledge of the system. Our first attempt is to reverse-engineer an inverse model for HNet under the stringent black-box condition, for which we propose to exploit the query process with specially curated input images. While…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Steganography and Watermarking Techniques · Image Processing and 3D Reconstruction
