# MENDELSEEK: An algorithm that predicts mendelian genes and elucidates what makes them special

**Authors:** Hongyi Zhou, Brice Edelman, Jeffrey Skolnick

PMC · DOI: 10.1371/journal.pcbi.1013992 · PLOS Computational Biology · 2026-02-17

## TL;DR

MENDELSEEK is a machine learning tool that accurately predicts genes responsible for rare Mendelian diseases and reveals their unique biochemical features.

## Contribution

MENDELSEEK outperforms existing methods by integrating residue variation scores with pathway and protein features to predict Mendelian genes.

## Key findings

- MENDELSEEK achieved an AUC of 0.869 and AUPR of 0.737, outperforming other methods in predicting Mendelian genes.
- Mendelian genes are found to have more protein-protein interactions and are evolutionarily ancient compared to non-Mendelian genes.
- MENDELSEEK predicted 1,277 novel Mendelian gene candidates with precision greater than 0.7 across the human genome.

## Abstract

Although individual Mendelian diseases—those caused by a single gene—are rare, their collective disease burden is substantial. Identifying the causal gene for each condition is essential for accurate diagnosis and effective treatment. Yet, despite decades of research, the genetic basis of more than half of all known Mendelian diseases remains unresolved. To address this gap, we introduce MENDELSEEK, a machine learning framework that predicts Mendelian genes by integrating residue variation scores with pathway participation, Gene Ontology processes, and protein language model features. In benchmarking across 16,946 human genes with 10-fold cross-validation, MENDELSEEK achieved an AUC of 0.869 and an AUPR of 0.737—substantially outperforming the next best methods, ENTPRISE+ENTPRISE-X (AUC 0.781; AUPR 0.626), and REVEL (AUC 0.585; AUPR 0.401). When applied to the full set of 17,858 human genes, MENDELSEEK predicted 1,277 novel Mendelian gene candidates with precision greater than 0.7. Analysis further revealed that Mendelian genes engage in significantly more protein-protein interactions than non-Mendelian genes and are evolutionarily ancient. Together, these results highlight MENDELSEEK as a major advance over existing methods, offering new insights into the biochemical features that distinguish Mendelian from non-Mendelian genes.

A patient with a rare, Mendelian disease can have hundreds of mutated genes. Identifying which gene causes the disease is crucial for accurate diagnosis and treatment and for understanding more complex diseases. However, despite decades of effort, the genetic causes of over half of identified Mendelian diseases are unknown. To address this, we describe MENDELSEEK, a machine learning approach that predicts Mendelian genes by integrating the gene’s aggregate residue variation score with properties such as their involved pathways, Gene Ontology processes, and protein language models. We show that MENDELSEEK performs significantly better than state-of-the-art approaches such as DeepMind’s AlphaMissense in distinguishing Mendelian from non-Mendelian genes. We also present significant findings that Mendelian genes have more protein-protein interactions than non-Mendelian genes and are evolutionarily ancient. In practice, the most relevant pathways and Gene Ontology processes of Mendelian genes are discovered through comprehensive U-test analysis. We further applied MENDELSEEK to whole human genome; 1,277 novel Mendelian genes with a precision >0.7 are predicted. This work not only helps understand what pathways and molecular processes cause a given Mendelian disease but by filtering hundreds of falsely identified genes by other methods, provides valuable guidance to geneticists.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Genes:** ITGB1 (integrin subunit beta 1) [NCBI Gene 3688] {aka CD29, FNRB, GPIIA, MDF2, MSK12, VLA-BETA}, COL1A1 (collagen type I alpha 1 chain) [NCBI Gene 1277] {aka CAFYD, EDSARTH1, EDSC, OI1, OI2, OI3}, ND6 (NADH dehydrogenase subunit 6) [NCBI Gene 4541] {aka MTND6}, SORL1 (sortilin related receptor 1) [NCBI Gene 6653] {aka C11orf32, LR11, LRP9, SORLA, SorLA-1, gp250}, COL2A1 (collagen type II alpha 1 chain) [NCBI Gene 1280] {aka ACG2, ANFH, ANFH1, AOM, COL11A3, EDMMD}, RPE65 (retinoid isomerohydrolase RPE65) [NCBI Gene 6121] {aka BCO3, LCA2, RP20, mRPE65, p63, rd12}, OAT (ornithine aminotransferase) [NCBI Gene 4942] {aka GACR, HOGA, OATASE, OKT}, RDH11 (retinol dehydrogenase 11) [NCBI Gene 51109] {aka ARSDR1, CGI82, HCBP12, MDT1, PSDR1, RALR1}, BBS4 (Bardet-Biedl syndrome 4) [NCBI Gene 585], INSR (insulin receptor) [NCBI Gene 3643] {aka CD220, HHF5}, HSD17B6 (hydroxysteroid 17-beta dehydrogenase 6) [NCBI Gene 8630] {aka HSE, RODH, SDR9C6}
- **Diseases:** Mendelian disorder (MESH:D025861), Retinal dystrophy (MESH:D058499), LHON (MESH:D029242), rare diseases (MESH:D035583), Leber congenital amaurosis (MESH:D057130), phalangeal epiphyseal dysplasia (MESH:C565179), Gyrate atrophy of choroid and retina (MESH:D015799), eye diseases (MESH:D005128), Bardet-Biedl syndrome (MESH:D020788), Spondyloperipheral dysplasia (MESH:C535799), dementia (MESH:D003704), kidney/renal diseases (MESH:D007674), genital abnormalities (MESH:D014564), Osteogenesis imperfecta, type I (MESH:D010013), Vitreoretinopathy with (MESH:D018630), CORDs (MESH:D000071700), cancers (MESH:D009369), mitochondrial disease (MESH:D028361), Ehlers-Danlos syndrome (MESH:D004535), -Mendelian diseases (MESH:D030342), Czech dysplasia (MESH:C535766), non-Mendelian diseases (MESH:D000073296), complex (MESH:D048090), Chondrogenesis, type II or hypochondrogenesis (MESH:C563007), LCA (MESH:D020326), Caffey disease (MESH:D006958), Retinitis pigmentosa (MESH:D012174)
- **Chemicals:** urea (MESH:D014508), amino acids (MESH:D000596), MENDELSEEK (-), Ornithine (MESH:D009952), aldehyde (MESH:D000447)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12956125/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12956125/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/PMC12956125/full.md

---
Source: https://tomesphere.com/paper/PMC12956125