On Function-Coupled Watermarks for Deep Neural Networks
Xiangyu Wen, Yu Li, Wei Jiang, Qiang Xu

TL;DR
This paper introduces a novel watermarking method for deep neural networks that tightly couples watermarks with model functions, making them resilient against removal attacks like fine-tuning and pruning, by using in-distribution features and weight masking.
Contribution
The proposed method enhances watermark robustness by embedding watermarks through in-distribution features and weight masking, ensuring removal attacks degrade model performance.
Findings
Achieves 100% watermark authentication success under removal attacks
Outperforms existing watermarking solutions in robustness
Effectively defends against fine-tuning and pruning attacks
Abstract
Well-performed deep neural networks (DNNs) generally require massive labelled data and computational resources for training. Various watermarking techniques are proposed to protect such intellectual properties (IPs), wherein the DNN providers implant secret information into the model so that they can later claim IP ownership by retrieving their embedded watermarks with some dedicated trigger inputs. While promising results are reported in the literature, existing solutions suffer from watermark removal attacks, such as model fine-tuning and model pruning. In this paper, we propose a novel DNN watermarking solution that can effectively defend against the above attacks. Our key insight is to enhance the coupling of the watermark and model functionalities such that removing the watermark would inevitably degrade the model's performance on normal inputs. To this end, unlike previous…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
They did a great job for defining the watermarking problem and well summarized the key prior research. Also, the proposed method appears to be simple, yet performing well across extensive experiments. They introduce a new training strategy: feature-fusion and a joint training approach as fuse them as watermark triggers and randomly masking the model weights during training that spreads the embedded watermark throughout the network. Therefore, it aims to strengthen the resistance of watermar
There are a lot of typos, inconsistency and a few incomplete sentences in the paper. Also, citation format is wrong throughout the paper missing parenthesis: “(“ Li et al. “)”, which hinders the readability of the paper. This can be a small or big problem. In the provided code, there are comments written in Chinese, allowing this paper is written by Chinese. This can possibly violate the anonymity of requirement. # 将数据集原本的标签与训练中需要的类编号相互关联。 # 将所有的预测结果放置到同一个list中 and many more… The prop
1. The proposed method conceptually makes sense. 2. The experimental results shown in this paper seem considerable.
1. The technical part of the paper is weak. The overall algorithmic pipeline appears to be rather naïve, with the generation of trigger samples relying solely on the concatenation or weighted overlay of two training set images. Furthermore, the paper lacks a theoretical explanation that justifies the proposed method. 2. The experiments are insufficient. The authors mentioned that their proposed invisible feature-fusion strategy could evade visual detection but did not give relevant experimental
+ Different from previous trigger patterns, which are generated with out-of-distribution samples of the training dataset, the submission proposes to combine in-distribution images as watermark triggers, i.e., the feature fusion methods. The coupling of model watermark and model functionalities improves the robustness against fine-tuning based attacks. + The random masking strategy generalizes the watermark function to different neurons of the model and further enhances the defensive ability, whi
+ **Weak threat model**: As it is generally considered that *model extraction is the de facto strongest attack for diminishing the watermark* [1] and much effort has been invested to enhance the robustness of black-box watermarking schemes against such an attack [1, Jia et al.(2021)], the submission should involve model extraction attacks in the adversarial scenario. Otherwise, simply demonstrating the resistance against fine-tuning-based and pruning-based attacks is not convincing enough for t
1. The authors focus on the issue of Intellectual Property (IP) protection for deep learning models and propose two novel feature-fusion methods to mitigate the impact of removal attacks. By employing a random masking strategy, they further promote the spread of watermark information within the network. The efficacy of their approach is validated through experiments. 2. The proposed function-coupled watermarking concept for DNN IP protection is simple yet effective. The authors demonstrate the
1. In Section 3.1 on FEATURE-FUSION DESIGN part, the authors claim that their approach differs from previous trigger-pattern-based watermarking methods, which introduce out-of-distribution features. However, based on the visual results in Figure 1, the first method, DIRECT FEATURE-FUSION METHOD, appears easily detectable by the human eye as differing from the original dataset. The second method, INVISIBLE FEATURE-FUSION METHOD, although more covert, resembles techniques adapted in some black-box
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
