Decoder Gradient Shields: A Family of Provable and High-Fidelity Methods Against Gradient-Based Box-Free Watermark Removal

Haonan An; Guang Hua; Wei Du; Hangcheng Cao; Yihang Tao; Guowen Xu; Susanto Rahardja; and Yuguang Fang

arXiv:2601.11952·cs.CV·January 21, 2026

Decoder Gradient Shields: A Family of Provable and High-Fidelity Methods Against Gradient-Based Box-Free Watermark Removal

Haonan An, Guang Hua, Wei Du, Hangcheng Cao, Yihang Tao, Guowen Xu, Susanto Rahardja, and Yuguang Fang

PDF

Open Access

TL;DR

This paper introduces Decoder Gradient Shields (DGSs), a set of provable defenses against gradient-based attacks on box-free neural network watermarks, ensuring high-fidelity watermark protection without compromising output quality.

Contribution

The paper proposes a novel family of defenses, DGSs, with closed-form solutions and provable performance, addressing a critical vulnerability in decoder-based watermarking methods.

Findings

01

DGSs achieve 100% defense success rate in diverse scenarios.

02

DGSs effectively prevent watermark removal attacks while maintaining image quality.

03

The methods are applicable to various tasks like deraining and image generation.

Abstract

Box-free model watermarking has gained significant attention in deep neural network (DNN) intellectual property protection due to its model-agnostic nature and its ability to flexibly manage high-entropy image outputs from generative models. Typically operating in a black-box manner, it employs an encoder-decoder framework for watermark embedding and extraction. While existing research has focused primarily on the encoders for the robustness to resist various attacks, the decoders have been largely overlooked, leading to attacks against the watermark. In this paper, we identify one such attack against the decoder, where query responses are utilized to obtain backpropagated gradients to train a watermark remover. To address this issue, we propose Decoder Gradient Shields (DGSs), a family of defense mechanisms, including DGS at the output (DGS-O), at the input (DGS-I), and in the layers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Steganography and Watermarking Techniques