Evaluating the Fairness Impact of Differentially Private Synthetic Data
Blake Bullwinkel, Kristen Grabarz, Lily Ke, Scarlett Gong, Chris, Tanner, Joshua Allen

TL;DR
This paper investigates how differentially private synthetic data impacts fairness in machine learning, revealing that privacy-preserving methods can often reduce fairness but can be improved with targeted data pre-processing.
Contribution
It evaluates four DP synthetic data methods, links fairness to minority group representation, and proposes a pre-processing technique to enhance fairness without sacrificing accuracy.
Findings
Three of four DP synthesizers degrade fairness in downstream tasks.
Fairness correlates with the proportion of minority groups in synthetic data.
Multi-label undersampling improves fairness without harming accuracy.
Abstract
Differentially private (DP) synthetic data is a promising approach to maximizing the utility of data containing sensitive information. Due to the suppression of underrepresented classes that is often required to achieve privacy, however, it may be in conflict with fairness. We evaluate four DP synthesizers and present empirical results indicating that three of these models frequently degrade fairness outcomes on downstream binary classification tasks. We draw a connection between fairness and the proportion of minority groups present in the generated synthetic data, and find that training synthesizers on data that are pre-processed via a multi-label undersampling method can promote more fair outcomes without degrading accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Internet Traffic Analysis and Secure E-voting
