Salvaging Forbidden Treasure in Medical Data: Utilizing Surrogate Outcomes and Single Records for Rare Event Modeling
Xiaohui Yin, Shane Sacco, Robert H. Aseltine, Fei Wang, Kun Chen

TL;DR
This paper introduces a hybrid learning framework that leverages surrogate outcomes and single-record patient data from electronic health records to improve modeling of rare events like suicide attempts.
Contribution
It presents a novel integrative approach combining supervised and unsupervised learning to utilize otherwise discarded single-record data and surrogate outcomes for rare event prediction.
Findings
Single-record data contain valuable information for rare event modeling.
Using surrogate outcomes improves the accuracy of suicide risk prediction.
The method significantly outperforms traditional models that exclude single-record patients.
Abstract
The vast repositories of Electronic Health Records (EHR) and medical claims hold untapped potential for studying rare but critical events, such as suicide attempt. Conventional setups often model suicide attempt as a univariate outcome and also exclude any ``single-record'' patients with a single documented encounter due to a lack of historical information. However, patients who were diagnosed with suicide attempts at the only encounter could, to some surprise, represent a substantial proportion of all attempt cases in the data, as high as 70--80%. We innovate a hybrid and integrative learning framework to leverage concurrent outcomes as surrogates and harness the forbidden yet precious information from single-record data. Our approach employs a supervised learning component to learn the latent variables that connect primary (e.g., suicide) and surrogate outcomes (e.g., mental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Forensic and Genetic Research · Data Quality and Management
