Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores
Rui Duan, Chenyin Gao, Justin Tubbs, Yi Han, Min Guo, Sijia Li, Erica Ma, Dailin Luo, Jordan Smoller, Phil Lee

TL;DR
This paper introduces UNSemblePRS, a new method that combines pre-trained genetic risk models without needing detailed population data, making it easier to use in real-world settings.
Contribution
The novel contribution is an unsupervised ensemble learning framework for integrating PRS models without requiring phenotype data or GWAS from the target population.
Findings
UNSemblePRS demonstrated robust performance across diverse populations in the All of Us database.
The method aggregates PRS models based on prediction concordance without needing observed phenotypes.
It offers scalability and applicability as the number of available PRS models increases.
Abstract
The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting the most suitable PRS model for a specific target population remains challenging, due to issues such as limited transferability, data heterogeneity, and the scarcity of observed phenotype in real-world settings. Ensemble learning offers a promising avenue to enhance the predictive accuracy of genetic risk assessments, but most existing methods often rely on observed phenotype data or additional genome-wide association studies (GWAS) from the target population to optimize ensemble weights, limiting their utility in real-time implementation. Here, we present the UNSupervised enSemble PRS (UNSemblePRS), an unsupervised ensemble learning framework, that combines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Anomaly Detection Techniques and Applications · Machine Learning in Healthcare
