On the Weaknesses of Backdoor-based Model Watermarking: An   Information-theoretic Perspective

Aoting Hu; Yanzhi Chen; Renjie Xie; Adrian Weller

arXiv:2409.06130·cs.CR·September 11, 2024

On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective

Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the vulnerabilities of backdoor-based model watermarking using information theory and proposes a new in-distribution watermarking method that is more robust against attacks while maintaining model accuracy.

Contribution

It introduces an information-theoretic analysis of watermarking vulnerabilities and proposes a novel in-distribution watermark embedding scheme to improve robustness.

Findings

01

Current out-distribution trigger-sets are vulnerable to white-box attacks.

02

The proposed IWE method enhances robustness against watermark removal.

03

Experiments show negligible accuracy loss (< 0.1%) with improved security.

Abstract

Safeguarding the intellectual property of machine learning models has emerged as a pressing concern in AI security. Model watermarking is a powerful technique for protecting ownership of machine learning models, yet its reliability has been recently challenged by recent watermark removal attacks. In this work, we investigate why existing watermark embedding techniques particularly those based on backdooring are vulnerable. Through an information-theoretic analysis, we show that the resilience of watermarking against erasure attacks hinges on the choice of trigger-set samples, where current uses of out-distribution trigger-set are inherently vulnerable to white-box adversaries. Based on this discovery, we propose a novel model watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the limitations of existing method. To further minimise the gap to clean models, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katerina828/iwe
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Digital Rights Management and Security · Internet Traffic Analysis and Secure E-voting