Data Isotopes for Data Provenance in DNNs

Emily Wenger; Xiuyu Li; Ben Y. Zhao; Vitaly Shmatikov

arXiv:2208.13893·cs.CR·February 28, 2023

Data Isotopes for Data Provenance in DNNs

Emily Wenger, Xiuyu Li, Ben Y. Zhao, Vitaly Shmatikov

PDF

Open Access

TL;DR

This paper introduces a system that allows users to verify if their data was used in training a DNN by creating unique isotopes that induce detectable spurious features, turning model vulnerabilities into a data provenance tool.

Contribution

The paper presents a practical method for data provenance in DNNs using isotopes, enabling detection of data usage without requiring access to training data or process details.

Findings

01

High accuracy in detecting isotopes across multiple settings

02

Effective on large models like ImageNet and public ML platforms

03

Robust against adaptive countermeasures

Abstract

Today, creators of data-hungry deep neural networks (DNNs) scour the Internet for training fodder, leaving users with little control over or knowledge of when their data is appropriated for model training. To empower users to counteract unwanted data use, we design, implement and evaluate a practical system that enables users to detect if their data was used to train an DNN model. We show how users can create special data points we call isotopes, which introduce "spurious features" into DNNs during training. With only query access to a trained model and no knowledge of the model training process, or control of the data labels, a user can apply statistical hypothesis testing to detect if a model has learned the spurious features associated with their isotopes by training on the user's data. This effectively turns DNNs' vulnerability to memorization and spurious correlations into a tool…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)