TL;DR
This paper emphasizes the importance of modeling data missingness using causal graphs to understand its impact on fairness guarantees in machine learning, revealing limitations of existing algorithms and proposing a decentralized fair algorithm.
Contribution
It introduces a causal framework for analyzing missing data mechanisms in fairness, and develops a decentralized algorithm that maintains fairness without requiring recoverable distributions.
Findings
Many fairness algorithms cannot guarantee fairness due to unmodeled missingness.
Causal modeling identifies the minimal distributions needed for fair decision-making.
Decentralized algorithms can match centralized performance in multi-stage screening.
Abstract
Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we characterize the missingness mechanisms in different real-world scenarios. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data. Our theoretical results imply that many of these algorithms can not guarantee fairness in practice. Modeling missingness also helps to identify correct design principles for fair algorithms. For example, in multi-stage settings where decisions are made in multiple screening rounds, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
