ProPublica's COMPAS Data Revisited

Matias Barenstein

arXiv:1906.04711·econ.GN·July 10, 2019·5 cites

ProPublica's COMPAS Data Revisited

Matias Barenstein

PDF

Open Access 1 Repo

TL;DR

This paper identifies a data processing error in ProPublica's COMPAS dataset that inflates recidivism rates and affects some fairness metrics, highlighting the importance of accurate data handling in algorithmic fairness research.

Contribution

It reveals a critical dataset construction flaw in ProPublica's COMPAS data and demonstrates its impact on recidivism statistics and fairness evaluations.

Findings

01

Over 40% more recidivists included due to error

02

Recidivism rate inflated by over 24%

03

Some statistical measures unaffected by the error

Abstract

I examine the COMPAS recidivism risk score and criminal history data collected by ProPublica in 2016 that fueled intense debate and research in the nascent field of 'algorithmic fairness'. ProPublica's COMPAS data is used in an increasing number of studies to test various definitions of algorithmic fairness. This paper takes a closer look at the actual datasets put together by ProPublica. In particular, the sub-datasets built to study the likelihood of recidivism within two years of a defendant's original COMPAS survey screening date. I take a new yet simple approach to visualize these data, by analyzing the distribution of defendants across COMPAS screening dates. I find that ProPublica made an important data processing error when it created these datasets, failing to implement a two-year sample cutoff rule for recidivists in such datasets (whereas it implemented a two-year sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kwaldenphd/propublica-compas-lab
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education