Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction
Jue Hou, Zijian Guo, Tianxi Cai

TL;DR
This paper introduces a surrogate-assisted semi-supervised learning method for high-dimensional risk prediction using EHR data, effectively leveraging unlabeled data and surrogates to improve inference and prediction accuracy.
Contribution
The paper develops a novel semi-supervised approach that combines surrogates and high-dimensional predictors for robust risk modeling with valid inference.
Findings
Outperforms existing supervised methods in simulations
Provides valid risk inference even with model mis-specification
Successfully applied to genetic risk prediction of type-2 diabetes
Abstract
Risk modeling with EHR data is challenging due to a lack of direct observations on the disease outcome, and the high dimensionality of the candidate predictors. In this paper, we develop a surrogate assisted semi-supervised-learning (SAS) approach to risk modeling with high dimensional predictors, leveraging a large unlabeled data on candidate predictors and surrogates of outcome, as well as a small labeled data with annotated outcomes. The SAS procedure borrows information from surrogates along with candidate predictors to impute the unobserved outcomes via a sparse working imputation model with moment conditions to achieve robustness against mis-specification in the imputation model and a one-step bias correction to enable interval estimation for the predicted risk. We demonstrate that the SAS procedure provides valid inference for the predicted risk derived from a high dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Cancer-related molecular mechanisms research · Gene expression and cancer classification
