Certified Data Removal from Machine Learning Models

Chuan Guo; Tom Goldstein; Awni Hannun; Laurens van der Maaten

arXiv:1911.03030·cs.LG·November 9, 2023·21 cites

Certified Data Removal from Machine Learning Models

Chuan Guo, Tom Goldstein, Awni Hannun, Laurens van der Maaten

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a method to ensure that data removal requests from machine learning models are verifiable and effective, providing strong theoretical guarantees for linear classifiers and exploring practical applications.

Contribution

It proposes the concept of certified removal for machine learning models and develops a mechanism for linear classifiers with empirical validation.

Findings

01

Certified removal guarantees data can be effectively erased from models.

02

The mechanism is practical for certain learning settings.

03

Theoretical and empirical analysis supports the approach.

Abstract

Good data stewardship requires removal of data at the request of the data's owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to "remove" data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/certified-removal
pytorchOfficial

Videos

Certified Data Removal from Machine Learning Models· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Cryptography and Data Security