Solving the Capsulation Attack against Backdoor-based Deep Neural Network Watermarks by Reversing Triggers
Fangqi Li, Shilin Wang, Yun Zhu

TL;DR
This paper introduces a capsulation attack that can invalidate most backdoor-based DNN watermarks and proposes a new scheme that is resistant to such attacks by reversing encoding and randomizing triggers.
Contribution
It presents the capsulation attack method and a novel watermarking scheme that enhances security against this attack in DNN models.
Findings
Capsulation attack effectively invalidates existing watermarks.
The proposed scheme resists capsulation attack by reversing encoding.
CAScore measures watermark security against capsulation attack.
Abstract
Backdoor-based watermarking schemes were proposed to protect the intellectual property of artificial intelligence models, especially deep neural networks, under the black-box setting. Compared with ordinary backdoors, backdoor-based watermarks need to digitally incorporate the owner's identity, which fact adds extra requirements to the trigger generation and verification programs. Moreover, these concerns produce additional security risks after the watermarking scheme has been published for as a forensics tool or the owner's evidence has been eavesdropped on. This paper proposes the capsulation attack, an efficient method that can invalidate most established backdoor-based watermarking schemes without sacrificing the pirated model's functionality. By encapsulating the deep neural network with a rule-based or Bayes filter, an adversary can block ownership probing and reject the ownership…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Digital Media Forensic Detection
