On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective
Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller

TL;DR
This paper analyzes the vulnerabilities of backdoor-based model watermarking using information theory and proposes a new in-distribution watermarking method that is more robust against attacks while maintaining model accuracy.
Contribution
It introduces an information-theoretic analysis of watermarking vulnerabilities and proposes a novel in-distribution watermark embedding scheme to improve robustness.
Findings
Current out-distribution trigger-sets are vulnerable to white-box attacks.
The proposed IWE method enhances robustness against watermark removal.
Experiments show negligible accuracy loss (< 0.1%) with improved security.
Abstract
Safeguarding the intellectual property of machine learning models has emerged as a pressing concern in AI security. Model watermarking is a powerful technique for protecting ownership of machine learning models, yet its reliability has been recently challenged by recent watermark removal attacks. In this work, we investigate why existing watermark embedding techniques particularly those based on backdooring are vulnerable. Through an information-theoretic analysis, we show that the resilience of watermarking against erasure attacks hinges on the choice of trigger-set samples, where current uses of out-distribution trigger-set are inherently vulnerable to white-box adversaries. Based on this discovery, we propose a novel model watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the limitations of existing method. To further minimise the gap to clean models, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Digital Rights Management and Security · Internet Traffic Analysis and Secure E-voting
