Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution
Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, Kui Ren

TL;DR
This paper introduces a novel watermarking method called Explanation as a Watermark (EaaW) that embeds ownership verification signals into feature attribution explanations, avoiding harmful backdoors and enabling multi-bit watermarking for models.
Contribution
The paper proposes a new watermarking paradigm that embeds multi-bit ownership signals into explanations rather than predictions, improving safety and robustness over existing zero-bit backdoor methods.
Findings
EaaW effectively embeds multi-bit watermarks into explanations.
The method is harmless and does not alter model predictions.
EaaW shows resistance to various attacks.
Abstract
Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including harmfulness and ambiguity. The former indicates that they introduce maliciously controllable misclassification behaviors (, backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity. In this paper, we argue that both limitations stem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Advanced Malware Detection Techniques · Advanced Steganography and Watermarking Techniques
