Identifying genes associated with phenotypes using machine and deep learning
Muhammad Muneeb, David B. Ascher, YooChan Myung

TL;DR
This paper presents a machine learning and deep learning pipeline that classifies individuals based on genotype data and identifies phenotype-associated genes, demonstrating high accuracy and potential for aiding disease research.
Contribution
The study introduces a novel ML/DL pipeline for gene identification that outperforms traditional methods in classifying phenotypes and prioritizing associated genes.
Findings
Mean gene identification ratio (GIR) of 0.84 across phenotypes
ML/DL-selected SNPs align with GWAS-identified SNPs
High classification performance metrics (AUC, F1, MCC)
Abstract
Identifying disease-associated genes enables the development of precision medicine and the understanding of biological processes. Genome-wide association studies (GWAS), gene expression data, biological pathway analysis, and protein network analysis are among the techniques used to identify causal genes. We propose a machine-learning (ML) and deep-learning (DL) pipeline to identify genes associated with a phenotype. The proposed pipeline consists of two interrelated processes. The first is classifying people into case/control based on the genotype data. The second is calculating feature importance to identify genes associated with a particular phenotype. We considered 30 phenotypes from the openSNP data for analysis, 21 ML algorithms, and 80 DL algorithms and variants. The best-performing ML and DL models, evaluated by the area under the curve (AUC), F1 score, and Matthews correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genomics and Rare Diseases · Bioinformatics and Genomic Networks
