Learning Interpretable Fair Representations
Tianhao Wang, Zana Bu\c{c}inca, Zilin Ma

TL;DR
This paper introduces a framework for learning fair, interpretable data representations that enhance utility and fairness in predictive tasks, enabling better insights and exploration by third parties.
Contribution
It proposes a novel method incorporating interpretable prior knowledge into fair representation learning, improving interpretability, accuracy, and fairness over existing approaches.
Findings
Representations are more interpretable and provide additional insights.
Achieve slightly higher accuracy in downstream tasks.
Attain fairer outcomes compared to state-of-the-art methods.
Abstract
Numerous approaches have been recently proposed for learning fair representations that mitigate unfair outcomes in prediction tasks. A key motivation for these methods is that the representations can be used by third parties with unknown objectives. However, because current fair representations are generally not interpretable, the third party cannot use these fair representations for exploration, or to obtain any additional insights, besides the pre-contracted prediction tasks. Thus, to increase data utility beyond prediction tasks, we argue that the representations need to be fair, yet interpretable. We propose a general framework for learning interpretable fair representations by introducing an interpretable "prior knowledge" during the representation learning process. We implement this idea and conduct experiments with ColorMNIST and Dsprite datasets. The results indicate that in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
