Risk Scores, Label Bias, and Everything but the Kitchen Sink
Michael Zanger-Tishler, Julian Nyarko, and Sharad Goel

TL;DR
This paper challenges the 'kitchen sink' approach in risk assessment algorithms, showing that including all features can be detrimental when predicting proxies of true outcomes due to label bias, and offers a criterion for feature exclusion.
Contribution
It introduces a new criterion for excluding features in risk models affected by label bias, improving prediction accuracy when using proxy outcomes.
Findings
Including all features can worsen predictions under label bias.
Features with opposite correlation signs to the true outcome should be excluded.
Geography may be excluded in criminal risk assessments due to label bias.
Abstract
In designing risk assessment algorithms, many scholars promote a "kitchen sink" approach, reasoning that more information yields more accurate predictions. We show, however, that this rationale often fails when algorithms are trained to predict a proxy of the true outcome, as is typically the case. With such "label bias", one should exclude a feature if its correlation with the proxy and its correlation with the true outcome have opposite signs, conditional on the other model features. This criterion is often satisfied when a feature is weakly correlated with the true outcome, and, additionally, that feature and the true outcome are both direct causes of the remaining features. For example, due to patterns of police deployment, criminal behavior and geography may be weakly correlated and direct causes of one's criminal record, suggesting one should exclude geography in criminal risk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCrime Patterns and Interventions
