Risk Prediction with Imperfect Survival Outcome Information from   Electronic Health Records

Stephanie F. Chan; Jue Hou; Xuan Wang; and Tianxi Cai

arXiv:2103.04409·stat.ME·March 9, 2021·1 cites

Risk Prediction with Imperfect Survival Outcome Information from Electronic Health Records

Stephanie F. Chan, Jue Hou, Xuan Wang, and Tianxi Cai

PDF

Open Access

TL;DR

This paper introduces a semi-supervised risk prediction method that leverages limited labeled data and abundant imperfect proxy data from electronic health records to accurately predict disease onset times.

Contribution

It develops a novel semisupervised approach combining proxy and limited label data under a flexible measurement error model, with proven consistency and asymptotic properties.

Findings

01

Performs well in finite sample simulations

02

Effective in predicting obesity onset from EHR data

03

Provides a resampling-based interval estimation method

Abstract

Readily available proxies for time of disease onset such as time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error models for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initial estimator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Genetic Associations and Epidemiology · Artificial Intelligence in Healthcare