Fairer and more accurate, but for whom?
Alexandra Chouldechova, Max G'Sell

TL;DR
This paper introduces a framework for comparing machine learning models across subgroups to identify where they differ most in fairness and accuracy, especially in high-stakes decision-making contexts.
Contribution
It presents a novel method for automatically detecting subgroups with significant differences in model fairness and accuracy metrics.
Findings
Identified subgroups with notable fairness disparities in recidivism prediction.
Demonstrated the framework's ability to reveal model differences in hypothetical lending scenarios.
Showed that overall performance metrics can mask important subgroup disparities.
Abstract
Complex statistical machine learning models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, it is therefore essential to be able to compare the proposed model to the existing approach across a range of task-relevant accuracy and fairness metrics. Looking at overall performance metrics, however, may be misleading. Even when two models have comparable overall performance, they may nevertheless disagree in their classifications on a considerable fraction of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life
