TL;DR
This study empirically evaluates various fairness methods in real-world high-stakes policy applications, revealing that post-processing techniques like group-specific thresholds reliably reduce disparities across diverse social issues.
Contribution
It provides the first comprehensive empirical comparison of fairness methods across multiple real-world policy problems, highlighting the effectiveness of post-processing approaches.
Findings
Post-processing with group-specific thresholds consistently reduces disparities.
Many fairness methods show variable and inconsistent performance.
Empirical evidence supports using threshold adjustments for fairer ML outcomes.
Abstract
Applications of machine learning (ML) to high-stakes policy settings -- such as education, criminal justice, healthcare, and social service delivery -- have grown rapidly in recent years, sparking important conversations about how to ensure fair outcomes from these systems. The machine learning research community has responded to this challenge with a wide array of proposed fairness-enhancing strategies for ML models, but despite the large number of methods that have been developed, little empirical work exists evaluating these methods in real-world settings. Here, we seek to fill this research gap by investigating the performance of several methods that operate at different points in the ML pipeline across four real-world public policy and social good problems. Across these problems, we find a wide degree of variability and inconsistency in the ability of many of these methods to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
