Entangled Watermarks as a Defense against Model Extraction
Hengrui Jia, Christopher A. Choquette-Choo, Varun Chandrasekaran,, Nicolas Papernot

TL;DR
This paper proposes Entangled Watermarking Embeddings (EWE), a novel method that embeds watermarks into models by entangling them with legitimate features, making removal difficult while preserving model accuracy.
Contribution
The paper introduces EWE, a new watermarking technique that entangles watermarks with legitimate features, improving robustness against removal and enabling efficient ownership verification.
Findings
Achieves 95% confidence in ownership with fewer than 100 queries.
Maintains less than 0.81% accuracy loss on average.
Effective across multiple datasets including MNIST and CIFAR-10.
Abstract
Machine learning involves expensive data collection and training procedures. Model owners may be concerned that valuable intellectual property can be leaked if adversaries mount model extraction attacks. As it is difficult to defend against model extraction without sacrificing significant prediction accuracy, watermarking instead leverages unused model capacity to have the model overfit to outlier input-output pairs. Such pairs are watermarks, which are not sampled from the task distribution and are only known to the defender. The defender then demonstrates knowledge of the input-output pairs to claim ownership of the model at inference. The effectiveness of watermarks remains limited because they are distinct from the task distribution and can thus be easily removed through compression or other forms of knowledge transfer. We introduce Entangled Watermarking Embeddings (EWE). Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Privacy-Preserving Technologies in Data
