DYNA: Disease-Specific Language Model for Variant Pathogenicity
Huixin Zhan, Zijun Zhang

TL;DR
DYNA is a disease-specific fine-tuning approach for genomic foundation models that improves variant effect prediction accuracy in clinical genetics, especially for rare and unseen variants.
Contribution
It introduces a Siamese neural network-based fine-tuning method that adapts existing genomic models for disease-specific variant effect prediction.
Findings
DYNA outperforms baseline models on rare variant testing sets.
Fine-tuned models show improved accuracy in ClinVAR annotations.
Effective for both coding and non-coding variant effect predictions.
Abstract
Clinical variant classification of pathogenic versus benign genetic variants remains a challenge in clinical genetics. Recently, the proposition of genomic foundation models has improved the generic variant effect prediction (VEP) accuracy via weakly-supervised or unsupervised training. However, these VEPs are not disease-specific, limiting their adaptation at the point of care. To address this problem, we propose DYNA: Disease-specificity fine-tuning via a Siamese neural network broadly applicable to all genomic foundation models for more effective variant effect predictions in disease-specific contexts. We evaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus on various cardiovascular diseases, where gene-disease relationships of loss-of-function vs. gain-of-function dictate disease-specific VEP. For non-coding VEPs, we apply DYNA to an essential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsSparse Evolutionary Training · Focus
