Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction
Christina Wadsworth, Francesca Vera, Chris Piech

TL;DR
This paper introduces an adversarial neural network model that predicts recidivism while reducing racial bias, achieving comparable accuracy to existing scores and improving fairness measures in a high-stakes criminal justice context.
Contribution
It presents a novel adversarial training approach to mitigate racial bias in recidivism prediction models, applicable to real-world criminal justice systems.
Findings
Model achieves predictive accuracy similar to COMPAS.
Improves fairness by approaching parity and equality of odds.
Generalizable to various prediction tasks and demographics.
Abstract
Recidivism prediction scores are used across the USA to determine sentencing and supervision for hundreds of thousands of inmates. One such generator of recidivism prediction scores is Northpointe's Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) score, used in states like California and Florida, which past research has shown to be biased against black inmates according to certain measures of fairness. To counteract this racial bias, we present an adversarially-trained neural network that predicts recidivism and is trained to remove racial bias. When comparing the results of our model to COMPAS, we gain predictive accuracy and get closer to achieving two out of three measures of fairness: parity and equality of odds. Our model can be generalized to any prediction and demographic. This piece of research contributes an example of scientific replication and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Criminal Justice and Corrections Analysis · Adversarial Robustness in Machine Learning
