A General Framework for Data-Use Auditing of ML Models

Zonghao Huang; Neil Zhenqiang Gong; Michael K. Reiter

arXiv:2407.15100·cs.CR·January 28, 2025

A General Framework for Data-Use Auditing of ML Models

Zonghao Huang, Neil Zhenqiang Gong, Michael K. Reiter

PDF

Open Access 1 Repo

TL;DR

This paper introduces a versatile framework for auditing whether specific data was used in training machine learning models, applicable across different model types without prior knowledge of the task.

Contribution

It presents a general, black-box approach combining membership inference and sequential hypothesis testing to detect data use with controlled false detection rates.

Findings

01

Effective in auditing image classifiers

02

Successful in auditing foundation models

03

Provides quantifiable detection with tunable false-positive rates

Abstract

Auditing the use of data in training machine-learning (ML) models is an increasingly pressing challenge, as myriad ML practitioners routinely leverage the effort of content creators to train models without their permission. In this paper, we propose a general method to audit an ML model for the use of a data-owner's data in training, without prior knowledge of the ML task for which the data might be used. Our method leverages any existing black-box membership inference method, together with a sequential hypothesis test of our own design, to detect data use with a quantifiable, tunable false-detection rate. We show the effectiveness of our proposed framework by applying it to audit data use in two types of ML models, namely image classifiers and foundation models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zonghaohuang007/ML_data_auditing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis