Methodological variations in lagged regression for detecting physiologic drug effects in EHR data
Matthew E. Levine, David J. Albers, George Hripcsak

TL;DR
This study systematically evaluates how different methodological choices in lagged regression impact the detection of drug effects in EHR data, highlighting the importance of data processing and modeling strategies.
Contribution
It provides a comprehensive analysis of methodological variations in lagged regression for EHR data, identifying key factors that influence detection accuracy.
Findings
Most accurate methods achieved AUROC of 0.794 and 0.705 for gold standards.
Time re-parameterization and specific modeling choices significantly improve performance.
Methodological variations can reduce AUROC to near 0.5, emphasizing their importance.
Abstract
We studied how lagged linear regression can be used to detect the physiologic effects of drugs from data in the electronic health record (EHR). We systematically examined the effect of methodological variations ((i) time series construction, (ii) temporal parameterization, (iii) intra-subject normalization, (iv) differencing (lagged rates of change achieved by taking differences between consecutive measurements), (v) explanatory variables, and (vi) regression models) on performance of lagged linear methods in this context. We generated two gold standards (one knowledge-base derived, one expert-curated) for expected pairwise relationships between 7 drugs and 4 labs, and evaluated how the 64 unique combinations of methodological perturbations reproduce gold standards. Our 28 cohorts included patients in Columbia University Medical Center/NewYork-Presbyterian Hospital clinical database.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
