TL;DR
This paper introduces a novel method to reduce personal data in machine learning models, ensuring GDPR compliance by minimizing input features without significantly affecting model accuracy.
Contribution
It presents the first method leveraging model knowledge to generalize inputs, achieving data minimization with provable accuracy preservation.
Findings
Effective reduction of input features with minimal accuracy loss
Enables GDPR-compliant data collection in ML models
Provides a provable guarantee of data minimization
Abstract
The EU General Data Protection Regulation (GDPR) mandates the principle of data minimization, which requires that only data necessary to fulfill a certain purpose be collected. However, it can often be difficult to determine the minimal amount of data required, especially in complex machine learning models such as neural networks. We present a first-of-a-kind method to reduce the amount of personal data needed to perform predictions with a machine learning model, by removing or generalizing some of the input features. Our method makes use of the knowledge encoded within the model to produce a generalization that has little to no impact on its accuracy. This enables the creators and users of machine learning models to acheive data minimization, in a provable manner.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
