Entangled Watermarks as a Defense against Model Extraction

Hengrui Jia; Christopher A. Choquette-Choo; Varun Chandrasekaran,; Nicolas Papernot

arXiv:2002.12200·cs.CR·February 22, 2021·46 cites

Entangled Watermarks as a Defense against Model Extraction

Hengrui Jia, Christopher A. Choquette-Choo, Varun Chandrasekaran,, Nicolas Papernot

PDF

Open Access 3 Repos

TL;DR

This paper proposes Entangled Watermarking Embeddings (EWE), a novel method that embeds watermarks into models by entangling them with legitimate features, making removal difficult while preserving model accuracy.

Contribution

The paper introduces EWE, a new watermarking technique that entangles watermarks with legitimate features, improving robustness against removal and enabling efficient ownership verification.

Findings

01

Achieves 95% confidence in ownership with fewer than 100 queries.

02

Maintains less than 0.81% accuracy loss on average.

03

Effective across multiple datasets including MNIST and CIFAR-10.

Abstract

Machine learning involves expensive data collection and training procedures. Model owners may be concerned that valuable intellectual property can be leaked if adversaries mount model extraction attacks. As it is difficult to defend against model extraction without sacrificing significant prediction accuracy, watermarking instead leverages unused model capacity to have the model overfit to outlier input-output pairs. Such pairs are watermarks, which are not sampled from the task distribution and are only known to the defender. The defender then demonstrates knowledge of the input-output pairs to claim ownership of the model at inference. The effectiveness of watermarks remains limited because they are distinct from the task distribution and can thus be easily removed through compression or other forms of knowledge transfer. We introduce Entangled Watermarking Embeddings (EWE). Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Privacy-Preserving Technologies in Data