TL;DR
This paper compares various fairness-enhancing algorithms in machine learning through an open benchmark, revealing correlations among fairness measures and sensitivity to dataset variations, highlighting challenges in their robustness.
Contribution
It introduces a comprehensive benchmark for comparing fairness interventions across multiple datasets and measures, emphasizing under-explored aspects of their behavior and robustness.
Findings
Fairness measures are often correlated.
Algorithms are sensitive to dataset composition changes.
Fairness interventions may be more brittle than expected.
Abstract
Computers are increasingly used to make decisions that have significant impact in people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers and predictors have appeared in the literature. This paper seeks to study the following questions: how do these different techniques fundamentally compare to one another, and what accounts for the differences? Specifically, we seek to bring attention to many under-appreciated aspects of such fairness-enhancing interventions. Concretely, we present the results of an open benchmark we have developed that lets us compare a number of different algorithms under a variety of fairness measures, and a large number of existing datasets. We find that although different algorithms tend to prefer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
