Benchmarking end-to-end genotype-to-phenotype prediction workflows across 80 openSNP phenotypes
Muhammad Muneeb, David B. Ascher, YooChan Myung, Samuel F. Feng, Andreas Henschel

TL;DR
This study benchmarks various genotype-to-phenotype prediction workflows across 80 phenotypes in openSNP, revealing performance variability and the importance of workflow choice depending on phenotype and data characteristics.
Contribution
It provides a comprehensive comparison of machine learning, deep learning, and polygenic score methods in a realistic, heterogeneous genomic dataset, highlighting their strengths and limitations.
Findings
Polygenic scores performed best for 53 phenotypes.
Machine and deep learning methods excelled in 27 phenotypes.
Performance varies significantly by phenotype and preprocessing choices.
Abstract
Genotype-to-phenotype prediction is a central goal of statistical genetics, yet practical comparisons of prediction workflows remain limited in small, heterogeneous, participant-shared genomic datasets. Here, we benchmarked end-to-end case-control prediction across 80 curated binary phenotypes from openSNP using machine learning, deep learning, and polygenic score workflows. We evaluated 29 machine-learning algorithms, 80 deep-learning model variants, and 3 polygenic score tools across 675 clumping and pruning configurations. No workflow family dominated universally. Polygenic score workflows achieved the highest observed discrimination for 53 phenotypes, whereas machine-learning or deep-learning workflows achieved the highest for 27. However, many apparent phenotype-level wins were modest, with 41.2\% of comparisons representing practical ties within five discrimination points.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
