The Data Minimization Principle in Machine Learning
Prakhar Ganesh, Cuong Tran, Reza Shokri, Ferdinando Fioretto

TL;DR
This paper introduces an optimization framework for implementing data minimization in machine learning, aligning legal principles with practical algorithms to enhance privacy while acknowledging real-world privacy complexities.
Contribution
It provides a rigorous formulation of data minimization based on legal definitions and adapts optimization algorithms for practical implementation.
Findings
Optimization algorithms can effectively reduce data collection while maintaining model performance.
Current privacy benefits of data minimization may be overestimated without considering real-world privacy risks.
The framework highlights gaps between legal privacy expectations and actual privacy outcomes.
Abstract
The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has been endorsed by various global data protection regulations. However, its practical implementation remains a challenge due to the lack of a rigorous formulation. This paper addresses this gap and introduces an optimization framework for data minimization based on its legal definitions. It then adapts several optimization algorithms to perform data minimization and conducts a comprehensive evaluation in terms of their compliance with minimization objectives as well as their impact on user privacy. Our analysis underscores the mismatch between the privacy expectations of data minimization and the actual privacy benefits, emphasizing the need for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Advanced Data Processing Techniques
