PhenoKG: Knowledge Graph-Driven Gene Discovery and Patient Insights from Phenotypes Alone
Kamilia Zaripova, Ege \"Ozsoy, Nassir Navab, Azade Farshad

TL;DR
This paper introduces PhenoKG, a graph-based model that leverages a knowledge graph and advanced neural networks to predict causative genes from phenotypes alone, significantly improving accuracy in genetic diagnosis.
Contribution
The novel integration of a rare disease knowledge graph with graph neural networks and transformers for gene prediction from phenotypes, including cases without candidate gene lists.
Findings
Achieves 24.64% MRR and 33.64% nDCG@100 on MyGene2 dataset
Outperforms the previous best baseline, SHEPHERD, in gene prediction accuracy
Demonstrates effective generalization to phenotype-only cases
Abstract
Identifying causative genes from patient phenotypes remains a significant challenge in precision medicine, with important implications for the diagnosis and treatment of genetic disorders. We propose a novel graph-based approach for predicting causative genes from patient phenotypes, with or without an available list of candidate genes, by integrating a rare disease knowledge graph (KG). Our model, combining graph neural networks and transformers, achieves substantial improvements over the current state-of-the-art. On the real-world MyGene2 dataset, it attains a mean reciprocal rank (MRR) of 24.64\% and nDCG@100 of 33.64\%, surpassing the best baseline (SHEPHERD) at 19.02\% MRR and 30.54\% nDCG@100. We perform extensive ablation studies to validate the contribution of each model component. Notably, the approach generalizes to cases where only phenotypic data are available, addressing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks
