Detect and remove watermark in deep neural networks via generative adversarial networks
Haoqi Wang, Mingfu Xue, Shichang Sun, Yushu Zhang, Jian Wang, Weiqiang, Liu

TL;DR
This paper presents a GAN-based method to detect and effectively remove watermarks from deep neural networks, significantly reducing watermark presence with minimal impact on model accuracy.
Contribution
It introduces a novel GAN-based attack that can reverse and remove backdoor watermarks from DNNs, highlighting vulnerabilities in current watermarking techniques.
Findings
Removes about 98% of watermarks in DNNs
Minimal impact on model accuracy (less than 3% drop)
Effective on MNIST and CIFAR10 datasets
Abstract
Deep neural networks (DNN) have achieved remarkable performance in various fields. However, training a DNN model from scratch requires a lot of computing resources and training data. It is difficult for most individual users to obtain such computing resources and training data. Model copyright infringement is an emerging problem in recent years. For instance, pre-trained models may be stolen or abuse by illegal users without the authorization of the model owner. Recently, many works on protecting the intellectual property of DNN models have been proposed. In these works, embedding watermarks into DNN based on backdoor is one of the widely used methods. However, when the DNN model is stolen, the backdoor-based watermark may face the risk of being detected and removed by an adversary. In this paper, we propose a scheme to detect and remove watermark in deep neural networks via generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
