Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun

TL;DR
This paper introduces a statistically grounded framework for fair PCA, proposes a memory-efficient streaming algorithm called FNPM, and demonstrates its effectiveness and fairness guarantees on real data.
Contribution
It formulates fair PCA within a new learnability framework and develops the first memory-efficient streaming algorithm with statistical guarantees.
Findings
The proposed FNPM algorithm is memory-efficient and suitable for streaming data.
Theoretical guarantees show PAFO-learnability for fair PCA.
Empirical results confirm the algorithm's efficacy and fairness on real datasets.
Abstract
Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Blind Source Separation Techniques
MethodsPrincipal Components Analysis
