Dataset Inference: Ownership Resolution in Machine Learning

Pratyush Maini; Mohammad Yaghini; Nicolas Papernot

arXiv:2104.10706·stat.ML·April 23, 2021·23 cites

Dataset Inference: Ownership Resolution in Machine Learning

Pratyush Maini, Mohammad Yaghini, Nicolas Papernot

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces dataset inference, a novel method for detecting if a stolen machine learning model contains knowledge from the original training data, providing a robust defense against model stealing attacks.

Contribution

It proposes a new dataset inference technique that leverages statistical testing and decision boundary estimation to identify stolen models containing original training data.

Findings

01

Achieves over 99% confidence in detecting stolen models.

02

Effective against state-of-the-art attacks without retraining or overfitting.

03

Works with limited exposed training points (50) from the stolen model.

Abstract

With increasingly more data and computation involved in their training, machine learning models constitute valuable intellectual property. This has spurred interest in model stealing, which is made more practical by advances in learning with partial, little, or no supervision. Existing defenses focus on inserting unique watermarks in a model's decision surface, but this is insufficient: the watermarks are not sampled from the training distribution and thus are not always preserved during model stealing. In this paper, we make the key observation that knowledge contained in the stolen model's training set is what is common to all stolen copies. The adversary's goal, irrespective of the attack employed, is always to extract this knowledge or its by-products. This gives the original model's owner a strong advantage over the adversary: model owners have access to the original training data.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cleverhans-lab/dataset-inference
pytorchOfficial

Videos

Dataset Inference: Ownership Resolution in Machine Learning· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications