Addressing Discretization-Induced Bias in Demographic Prediction

Evan Dong; Aaron Schein; Yixin Wang; Nikhil Garg

arXiv:2405.16762·cs.CY·May 28, 2024

Addressing Discretization-Induced Bias in Demographic Prediction

Evan Dong, Aaron Schein, Yixin Wang, Nikhil Garg

PDF

Open Access 1 Repo

TL;DR

This paper investigates how discretization of continuous demographic predictions causes bias, particularly undercounting African-American voters, and proposes a joint optimization method to eliminate this bias with minimal accuracy loss.

Contribution

It introduces a novel joint optimization approach and a data-driven thresholding heuristic to address discretization bias in demographic prediction tasks.

Findings

01

Discretization bias can significantly undercount certain demographic groups.

02

The proposed method effectively reduces bias with negligible accuracy loss.

03

Calibrated continuous models alone are insufficient to eliminate discretization bias.

Abstract

Racial and other demographic imputation is necessary for many applications, especially in auditing disparities and outreach targeting in political campaigns. The canonical approach is to construct continuous predictions -- e.g., based on name and geography -- and then to $discretize$ the predictions by selecting the most likely class (argmax). We study how this practice produces $discretization bias$ . In particular, we show that argmax labeling, as used by a prominent commercial voter file vendor to impute race/ethnicity, results in a substantial under-count of African-American voters, e.g., by 28.2% points in North Carolina. This bias can have substantial implications in downstream tasks that use such labels. We then introduce a $joint optimization$ approach -- and a tractable $data-driven thresholding$ heuristic -- that can eliminate this bias,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evan-dong/demographic-prediction-argmax-bias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInsurance, Mortality, Demography, Risk Management