A Doubly Robust Framework for Addressing Outcome-Dependent Selection Bias in Multi-Cohort EHR Studies
Ritoban Kundu, Xu Shi, Michael Kleinsasser, Lars G. Fritsche, Maxwell Salvatore, Bhramar Mukherjee

TL;DR
This paper proposes a doubly robust statistical framework, JAIPW, to correct outcome-dependent selection bias in multi-cohort EHR studies, improving estimation accuracy over traditional methods especially under model misspecification.
Contribution
The paper introduces the JAIPW method, which combines data from multiple cohorts and external samples with a double robustness property to address outcome-dependent selection bias.
Findings
JAIPW reduces bias and RMSE significantly compared to existing methods.
Simulation studies demonstrate up to six times lower bias with JAIPW.
Application to MGI data yields estimates aligned with national benchmarks.
Abstract
Selection bias can hinder accurate estimation of association parameters in binary disease risk models using non-probability samples like electronic health records (EHRs). The issue is compounded when participants are recruited from multiple clinics/centers with varying selection mechanisms that may depend on the disease/outcome of interest. Traditional inverse-probability-weighted (IPW) methods, based on constructed parametric selection models, often struggle with misspecifications when selection mechanisms vary across cohorts. This paper introduces a new Joint Augmented Inverse Probability Weighted (JAIPW) method, which integrates individual-level data from multiple cohorts collected under potentially outcome-dependent selection mechanisms, with data from an external probability sample. JAIPW offers double robustness by incorporating a flexible auxiliary score model to address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Epidemiology · Advanced Causal Inference Techniques
