Explanation as a Watermark: Towards Harmless and Multi-bit Model   Ownership Verification via Watermarking Feature Attribution

Shuo Shao; Yiming Li; Hongwei Yao; Yiling He; Zhan Qin; Kui Ren

arXiv:2405.04825·cs.CR·September 11, 2024·1 cites

Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution

Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, Kui Ren

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel watermarking method called Explanation as a Watermark (EaaW) that embeds ownership verification signals into feature attribution explanations, avoiding harmful backdoors and enabling multi-bit watermarking for models.

Contribution

The paper proposes a new watermarking paradigm that embeds multi-bit ownership signals into explanations rather than predictions, improving safety and robustness over existing zero-bit backdoor methods.

Findings

01

EaaW effectively embeds multi-bit watermarks into explanations.

02

The method is harmless and does not alter model predictions.

03

EaaW shows resistance to various attacks.

Abstract

Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including harmfulness and ambiguity. The former indicates that they introduce maliciously controllable misclassification behaviors ( $i . e .$ , backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity. In this paper, we argue that both limitations stem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shaoshuo-ss/eaaw
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Advanced Malware Detection Techniques · Advanced Steganography and Watermarking Techniques