Unlocking Retrospective Prevalent Information in EHRs -- a Pairwise Pseudolikelihood Approach
Nir Keret, Malka Gorfine

TL;DR
This paper introduces a novel statistical method for analyzing retrospective prevalent data in electronic health records, improving estimation efficiency and enabling better genetic variant replication analysis.
Contribution
It develops consistent estimators for disease-onset age models that incorporate prevalent data, overcoming computational challenges and enhancing analysis of large-scale EHR repositories.
Findings
Method yields approximately twice as many replicated genetic discoveries.
Simulations show substantial efficiency gains over existing approaches.
Application to bladder cancer data demonstrates practical utility.
Abstract
Typically, electronic health record data are not collected towards a specific research question. Instead, they comprise numerous observations recruited at different ages, whose medical, environmental and oftentimes also genetic data are being collected. Some phenotypes, such as disease-onset ages, may be reported retrospectively if the event preceded recruitment, and such observations are termed ``prevalent". The standard method to accommodate this ``delayed entry" conditions on the entire history up to recruitment, hence the retrospective prevalent failure times are conditioned upon and cannot participate in estimating the disease-onset age distribution. An alternative approach conditions just on survival up to recruitment age, plus the recruitment age itself. This approach allows incorporating the prevalent information but brings about numerical and computational difficulties. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Statistical Methods and Inference · Genetic Associations and Epidemiology
