EFGPP: Exploratory framework for genotype-phenotype prediction
Muhammad Muneeb, David B. Ascher

TL;DR
EFGPP is a reproducible framework that integrates diverse genetic, clinical, and molecular data sources to improve genotype-to-phenotype prediction, demonstrated on migraine data from the UK Biobank.
Contribution
The paper introduces EFGPP, a novel framework for combining multiple heterogeneous data types for enhanced genotype-phenotype prediction.
Findings
Combining data types improved prediction accuracy from 0.644 to 0.688 AUC.
Genotype-derived features outperformed polygenic risk scores alone.
Depression-derived PRS provided useful predictive signal.
Abstract
Predicting complex human traits from genetic data is challenging because different genetic, clinical, and molecular data sources often contain different parts of the signal. Here, we present EFGPP, a reproducible framework for generating, ranking, and combining multiple types of data for genotype-to-phenotype prediction. We applied EFGPP to migraine prediction using UK Biobank data from 733 individuals. The framework combined genotype-derived features, principal components, clinical and metabolomic covariates, and polygenic risk scores generated from migraine and depression GWAS using PLINK, PRSice-2, AnnoPred, and LDAK-GWAS. The best single data type achieved a test AUC of 0.644, while combining multiple data types improved performance to 0.688 using migraine-focused inputs and 0.663 using cross-trait depression-derived inputs. Genetic features alone did not outperform the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
