Leveraging text data for causal inference using electronic health records
Reagan Mozer, Aaron R. Kaufman, Leo A. Celi, and Luke Miratrix

TL;DR
This paper introduces a framework that combines natural language processing and statistical analysis to utilize unstructured clinical text data from electronic health records for improved causal inference, addressing missing data and bias.
Contribution
It presents a unified approach integrating text analysis with causal inference methods, demonstrated through an EHR study on treatment effects and patient subgroups.
Findings
Text data enhances causal analysis validity.
Incorporating text identifies patient subgroups benefiting from treatment.
Open-source tools facilitate adoption in clinical research.
Abstract
In studies that rely on data from electronic health records (EHRs), unstructured text data such as clinical progress notes offer a rich source of information about patient characteristics and care that may be missing from structured data. Despite the prevalence of text in clinical research, these data are often ignored for the purposes of quantitative analysis due their complexity. This paper presents a unified framework for leveraging text data to support causal inference with electronic health data at multiple stages of analysis. In particular, we consider how natural language processing and statistical text analysis can be combined with standard inferential techniques to address common challenges due to missing data, confounding bias, and treatment effect heterogeneity. Through an application to a recent EHR study investigating the effects of a non-randomized medical intervention on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · demographic modeling and climate adaptation · Machine Learning in Healthcare
MethodsFocus
