EnTao-GPM: DNA Foundation Model for Predicting the Germline Pathogenic Mutations
Zekai Lin, Haoran Sun, Yucheng Guo, Yujie Yang, Yanwen Wang, Bozhen Hu, Chonghang Ye, Qirong Yang, Fan Zhong, Xiaoming Zhang, Lei Liu

TL;DR
EnTao-GPM is a novel DNA foundation model that uses cross-species pre-training and fine-tuning to improve the prediction and interpretation of germline pathogenic mutations for clinical and research applications.
Contribution
It introduces a multi-innovation approach combining cross-species pre-training, mutation-specific fine-tuning, and interpretable frameworks for enhanced mutation classification.
Findings
Demonstrates superior accuracy on ClinVar dataset.
Enables faster and more accurate genetic testing.
Provides interpretable insights for clinical decision-making.
Abstract
Distinguishing pathogenic mutations from benign polymorphisms remains a critical challenge in precision medicine. EnTao-GPM, developed by Fudan University and BioMap, addresses this through three innovations: (1) Cross-species targeted pre-training on disease-relevant mammalian genomes (human, pig, mouse), leveraging evolutionary conservation to enhance interpretation of pathogenic motifs, particularly in non-coding regions; (2) Germline mutation specialization via fine-tuning on ClinVar and HGMD, improving accuracy for both SNVs and non-SNVs; (3) Interpretable clinical framework integrating DNA sequence embeddings with LLM-based statistical explanations to provide actionable insights. Validated against ClinVar, EnTao-GPM demonstrates superior accuracy in mutation classification. It revolutionizes genetic testing by enabling faster, more accurate, and accessible interpretation for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
