TL;DR
This paper systematically examines how anonymization techniques like $k$-anonymity, $ ext{l}$-diversity, and $t$-closeness affect machine learning fairness, revealing significant impacts on group fairness and input homogeneity.
Contribution
It provides the first comprehensive quantitative analysis of anonymization's effects on both individual and group ML fairness metrics.
Findings
Anonymization can reduce group fairness metrics by up to four times.
Stronger anonymization improves similarity-based individual fairness metrics.
Trade-offs between privacy, fairness, and utility are highlighted.
Abstract
Machine learning (ML) algorithms are heavily based on the availability of training data, which, depending on the domain, often includes sensitive information about data providers. This raises critical privacy concerns. Anonymization techniques have emerged as a practical solution to address these issues by generalizing features or suppressing data to make it more difficult to accurately identify individuals. Although recent studies have shown that privacy-enhancing technologies can influence ML predictions across different subgroups, thus affecting fair decision-making, the specific effects of anonymization techniques, such as -anonymity, -diversity, and -closeness, on ML fairness remain largely unexplored. In this work, we systematically audit the impact of anonymization techniques on ML fairness, evaluating both individual and group fairness. Our quantitative study reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
