Estimating Racial Disparities When Race is Not Observed
Cory McCartan, Robin Fisher, Jacob Goldin, Daniel E. Ho, Kosuke Imai

TL;DR
This paper introduces BIRDiE, a new scalable method that uses surnames as instrumental variables to estimate racial disparities without direct race data, improving accuracy over existing methods like BISG.
Contribution
The paper develops BIRDiE, a novel Bayesian instrumental regression approach that corrects bias in racial disparity estimates when race is unobserved, using surnames as instruments.
Findings
BIRDiE reduces estimation error by up to 84% in validation studies.
The method is scalable to large administrative datasets.
Application to IRS data reveals racial differences in mortgage benefit distribution.
Abstract
The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census data to predict race. Unfortunately, the residuals of BISG are often correlated with the outcomes of interest, generally attenuating estimates of racial disparities. To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics. We introduce a new class of models, Bayesian Instrumental Regression for Disparity Estimation (BIRDiE),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRacial and Ethnic Identity Research
