TL;DR
This paper introduces REFIT, a unified watermark removal framework for deep learning models that effectively removes watermarks using limited data, highlighting vulnerabilities in current watermarking schemes.
Contribution
REFIT is a novel fine-tuning based framework that employs EWC and unlabeled data augmentation to remove watermarks without prior watermark knowledge.
Findings
REFIT effectively removes watermarks across various schemes.
EWC and AU reduce the need for labeled data in watermark removal.
Unlabeled data from different distributions can be used for augmentation.
Abstract
Training deep neural networks from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained deep neural networks from potential copyright infringements. However, these techniques could be vulnerable to watermark removal attacks. In this work, we propose REFIT, a unified watermark removal framework based on fine-tuning, which does not rely on the knowledge of the watermarks, and is effective against a wide range of watermarking schemes. In particular, we conduct a comprehensive study of a realistic attack scenario where the adversary has limited training data, which has not been emphasized in prior work on attacks against watermarking schemes. To effectively remove the watermarks without compromising the model functionality under this weak threat model, we propose two techniques…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsElastic Weight Consolidation
