The trade-off between data minimization and fairness in collaborative filtering
Nasim Sonboli, Sipei Li, Mehdi Elahi, Asia Biega

TL;DR
This paper investigates the conflicting goals of data minimization and fairness in recommender systems under GDPR, analyzing how active learning strategies impact accuracy and fairness, and highlighting the trade-offs involved.
Contribution
It introduces an analysis of the trade-offs between data minimization via active learning and fairness in recommender systems, a topic with limited prior research.
Findings
Different active learning strategies affect accuracy variably.
Nearly all strategies tend to negatively impact fairness.
Insights provided for GDPR-compliant recommender system development.
Abstract
General Data Protection Regulations (GDPR) aim to safeguard individuals' personal information from harm. While full compliance is mandatory in the European Union and the California Privacy Rights Act (CPRA), it is not in other places. GDPR requires simultaneous compliance with all the principles such as fairness, accuracy, and data minimization. However, it overlooks the potential contradictions within its principles. This matter gets even more complex when compliance is required from decision-making systems. Therefore, it is essential to investigate the feasibility of simultaneously achieving the goals of GDPR and machine learning, and the potential tradeoffs that might be forced upon us. This paper studies the relationship between the principles of data minimization and fairness in recommender systems. We operationalize data minimization via active learning (AL) because, unlike many…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Mobile Crowdsensing and Crowdsourcing · Privacy-Preserving Technologies in Data
