Exploring Random Forest in Genetic Risk Score Construction
Vaishnavi Venkat, Kaylyn Clark, X. Jessie Jeng, Tsung‐Chieh Yao, Hui‐Ju Tsai, Tzu‐Pin Lu, Tzu‐Hung Hsiao, Ching‐Heng Lin, Shannon Holloway, Cathrine Hoyo, Shin‐Yi Chou, Hui Wang, Wan‐Ping Lee, Li‐San Wang, Jung‐Ying Tzeng

TL;DR
This paper explores using random forest models to build genetic risk scores that better capture complex genetic interactions compared to traditional methods.
Contribution
The study introduces two novel random forest-based genetic risk score strategies, ctRF and wRF, which improve performance by incorporating genetic variant interactions and base data information.
Findings
ctRF outperforms other random forest-based and classical additive models for traits with complex genetic architectures.
Incorporating informative base data into random forest-based genetic risk scores enhances predictive accuracy.
Random forest-based genetic risk scores effectively capture nonlinear genetic interactions in complex traits.
Abstract
Genetic risk scores (GRS) are crucial tools for estimating an individual's genetic liability to various traits and diseases, computed as a weighted sum of trait‐associated allele counts. Traditionally, GRS models assume additive, linear effects of risk variants. However, complex traits often involve nonadditive interactions, such as epistasis, which are not captured by these conventional methods. In this study, we investigate the use of random forest (RF) models as a model‐free approach for constructing GRS, leveraging RF's capacity to capture complex, nonlinear interactions among genetic variants. Specifically, we introduce two new RF‐based GRS strategies to boost RF performance and to incorporate base data information if available, including (1) ctRF, which optimizes linkage disequilibrium (LD) clumping and p‐value thresholds within RF; and (2) wRF, which adjusts the chance of SNP…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock
