TL;DR
This paper introduces a probabilistic modeling approach with latent variables to learn fair decision-making distributions from biased data, ensuring demographic parity and improving fairness in machine learning applications.
Contribution
It proposes a novel method using probabilistic circuits to model unbiased labels from biased data, enabling fairer predictions with theoretical and empirical validation.
Findings
Successfully retrieves fair labels from biased data in synthetic experiments.
Outperforms existing methods in modeling data distribution and fairness.
Achieves competitive accuracy on real-world datasets.
Abstract
Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
