A synthetic data integration framework to leverage external summary-level information from heterogeneous populations
Tian Gu, Jeremy M.G. Taylor, Bhramar Mukherjee

TL;DR
This paper introduces a flexible imputation-based framework that integrates external summary-level information with internal data to enhance risk prediction models, accommodating heterogeneity across populations and incomplete predictor sets.
Contribution
It proposes a novel synthetic data imputation method that leverages external summary data to improve internal model estimation and prediction accuracy.
Findings
Improves statistical efficiency of internal model coefficients.
Enhances prediction accuracy using partial external information.
Provides inference for heterogeneous external populations.
Abstract
There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology where the goal is to fit a target regression model with all available predictors in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Genetic and phenotypic traits in livestock
