Anonymity Unveiled: A Practical Framework for Auditing Data Use in Deep Learning Models

Zitao Chen; Karthik Pattabiraman

arXiv:2409.06280·cs.CR·May 23, 2025

Anonymity Unveiled: A Practical Framework for Auditing Data Use in Deep Learning Models

Zitao Chen, Karthik Pattabiraman

PDF

Open Access

TL;DR

This paper introduces MembershipTracker, a practical tool that enables users to detect unauthorized use of their data in deep learning models by leveraging targeted data marking and membership inference techniques.

Contribution

It presents a novel, lightweight data auditing framework that requires minimal data marking and effectively detects unauthorized data usage in large-scale deep learning models.

Findings

01

Achieves 0% FPR at 100% TPR in detecting data usage.

02

Effective on industry-scale datasets like ImageNet-1k.

03

Robust against multiple countermeasures.

Abstract

The rise of deep learning (DL) has led to a surging demand for training data, which incentivizes the creators of DL models to trawl through the Internet for training materials. Meanwhile, users often have limited control over whether their data (e.g., facial images) are used to train DL models without their consent, which has engendered pressing concerns. This work proposes MembershipTracker, a practical data auditing tool that can empower ordinary users to reliably detect the unauthorized use of their data in training DL models. We view data auditing through the lens of membership inference (MI). MembershipTracker consists of a lightweight data marking component to mark the target data with small and targeted changes, which can be strongly memorized by the model trained on them; and a specialized MI-based verification process to audit whether the model exhibits strong memorization on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management