The Price of Fair PCA: One Extra Dimension
Samira Samadi, Uthaipon Tantipongpipat, Jamie Morgenstern, Mohit, Singh, Santosh Vempala

TL;DR
This paper examines how standard PCA can produce biased data representations for different populations and proposes a Fair PCA method that ensures similar fidelity across groups, demonstrated on real-world datasets.
Contribution
The paper introduces a polynomial-time algorithm for Fair PCA that balances reconstruction error across populations, improving fairness in dimensionality reduction.
Findings
PCA often has higher reconstruction error for certain populations.
The proposed Fair PCA algorithm reduces disparity in data fidelity.
Real-world data experiments show effective fair representations.
Abstract
We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from A and B. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A and B. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT Impact and Policies
MethodsPrincipal Components Analysis
