Evaluating genetic-based disease prediction approaches through simulation
Max Shpak, Eric Parfitt, Soroush Mahmoudiandehkordi, Mehdi Maadooliat, Steven J. Schrodi

TL;DR
This study uses simulations to compare how well different machine learning models predict disease risk based on genetic data.
Contribution
The study introduces a systematic simulation framework to evaluate genetic-based disease prediction models under various inheritance modes.
Findings
Random forest models outperformed other classifiers in predicting disease phenotypes across different inheritance modes.
AUC was found to have a curvilinear relationship with the difference in polygenic risk scores between cases and controls.
Odds-risk models better estimate AUC-PRS associations for small genetic effects, while liability threshold models are better for strong effects.
Abstract
Common diseases exhibit substantial heritability, and GWAS of these diseases have revealed hundreds of thousands of high-frequency disease susceptibility variants throughout the genome. These studies offer the prospect of using genomic data to improve disease prediction and diagnosis, however, the relative performance of different predictive modeling approaches is not well-characterized. To investigate this systematically, we constructed a Monte Carlo simulation generating model genomes with 500 SNPs carrying risk alleles that are parameterized by the strength of their effects and by different modes of inheritance—additive, dominant, recessive, and combinations thereof. After generating genotypes for cases and controls, several machine learning classifiers (logistic regression, naïve Bayes, random forests, and neural networks, with and without feature selection) were applied to predict…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genomics and Rare Diseases · Genetic and phenotypic traits in livestock
