The Price of Fair PCA: One Extra Dimension

Samira Samadi; Uthaipon Tantipongpipat; Jamie Morgenstern; Mohit; Singh; Santosh Vempala

arXiv:1811.00103·cs.LG·November 2, 2018·24 cites

The Price of Fair PCA: One Extra Dimension

Samira Samadi, Uthaipon Tantipongpipat, Jamie Morgenstern, Mohit, Singh, Santosh Vempala

PDF

Open Access 1 Repo

TL;DR

This paper examines how standard PCA can produce biased data representations for different populations and proposes a Fair PCA method that ensures similar fidelity across groups, demonstrated on real-world datasets.

Contribution

The paper introduces a polynomial-time algorithm for Fair PCA that balances reconstruction error across populations, improving fairness in dimensionality reduction.

Findings

01

PCA often has higher reconstruction error for certain populations.

02

The proposed Fair PCA algorithm reduces disparity in data fidelity.

03

Real-world data experiments show effective fair representations.

Abstract

We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from A and B. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A and B. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samirasamadi/Fair-PCA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsICT Impact and Policies

MethodsPrincipal Components Analysis