# Unsupervised Ensemble Learning for Efficient Integration of Pre-trained Polygenic Risk Scores

**Authors:** Rui Duan, Chenyin Gao, Justin Tubbs, Yi Han, Min Guo, Sijia Li, Erica Ma, Dailin Luo, Jordan Smoller, Phil Lee

PMC · DOI: 10.21203/rs.3.rs-5976048/v1 · Research Square · 2025-04-01

## TL;DR

This paper introduces UNSemblePRS, a new method that combines pre-trained genetic risk models without needing detailed population data, making it easier to use in real-world settings.

## Contribution

The novel contribution is an unsupervised ensemble learning framework for integrating PRS models without requiring phenotype data or GWAS from the target population.

## Key findings

- UNSemblePRS demonstrated robust performance across diverse populations in the All of Us database.
- The method aggregates PRS models based on prediction concordance without needing observed phenotypes.
- It offers scalability and applicability as the number of available PRS models increases.

## Abstract

The growing availability of pre-trained polygenic risk score (PRS) models has enabled their integration into real-world applications, reducing the need for extensive data labeling, training, and calibration. However, selecting the most suitable PRS model for a specific target population remains challenging, due to issues such as limited transferability, data heterogeneity, and the scarcity of observed phenotype in real-world settings. Ensemble learning offers a promising avenue to enhance the predictive accuracy of genetic risk assessments, but most existing methods often rely on observed phenotype data or additional genome-wide association studies (GWAS) from the target population to optimize ensemble weights, limiting their utility in real-time implementation. Here, we present the UNSupervised enSemble PRS (UNSemblePRS), an unsupervised ensemble learning framework, that combines pre-trained PRS models without requiring phenotype data or summaries from the target population. Unlike traditional supervised approaches, UNSemblePRS aggregates models based on prediction concordance across a curated subset of candidate PRS models. We evaluated UNSemblePRS using both continuous and binary traits in the All of Us database, demonstrating its scalability and robust performance across diverse populations. These results underscore UNSemblePRS as an accessible tool for integrating PRS models into real-world contexts, offering broad applicability as the availability of PRS models continues to expand.

---
Source: https://tomesphere.com/paper/PMC11998766