TL;DR
This paper introduces RIDDLE, a deep learning method that accurately imputes race and ethnicity from electronic medical records, outperforming traditional models and revealing insights into disease patterns across different groups.
Contribution
The study presents a novel deep neural network approach for race and ethnicity imputation from medical histories, with improved accuracy and interpretability over existing methods.
Findings
RIDDLE significantly outperforms logistic regression and random forest in accuracy and AUC.
Interpretable features reveal medical indicators predictive of race and ethnicity.
Imputed race and ethnicity help uncover differential disease patterns.
Abstract
Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), and area under the curve for receiver operating characteristic plots (all ). We made specific efforts to interpret the trained neural network models to identify,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
