Data Minimization for GDPR Compliance in Machine Learning Models

Abigail Goldsteen; Gilad Ezov; Ron Shmelkin; Micha Moffie; Ariel; Farkash

arXiv:2008.04113·cs.LG·February 2, 2022

Data Minimization for GDPR Compliance in Machine Learning Models

Abigail Goldsteen, Gilad Ezov, Ron Shmelkin, Micha Moffie, Ariel, Farkash

PDF

1 Repo

TL;DR

This paper introduces a novel method to reduce personal data in machine learning models, ensuring GDPR compliance by minimizing input features without significantly affecting model accuracy.

Contribution

It presents the first method leveraging model knowledge to generalize inputs, achieving data minimization with provable accuracy preservation.

Findings

01

Effective reduction of input features with minimal accuracy loss

02

Enables GDPR-compliant data collection in ML models

03

Provides a provable guarantee of data minimization

Abstract

The EU General Data Protection Regulation (GDPR) mandates the principle of data minimization, which requires that only data necessary to fulfill a certain purpose be collected. However, it can often be difficult to determine the minimal amount of data required, especially in complex machine learning models such as neural networks. We present a first-of-a-kind method to reduce the amount of personal data needed to perform predictions with a machine learning model, by removing or generalizing some of the input features. Our method makes use of the knowledge encoded within the model to produce a generalization that has little to no impact on its accuracy. This enables the creators and users of machine learning models to acheive data minimization, in a provable manner.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/ai-privacy-toolkit
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.