DYNA: Disease-Specific Language Model for Variant Pathogenicity

Huixin Zhan; Zijun Zhang

arXiv:2406.00164·q-bio.GN·June 4, 2024·2 cites

DYNA: Disease-Specific Language Model for Variant Pathogenicity

Huixin Zhan, Zijun Zhang

PDF

Open Access

TL;DR

DYNA is a disease-specific fine-tuning approach for genomic foundation models that improves variant effect prediction accuracy in clinical genetics, especially for rare and unseen variants.

Contribution

It introduces a Siamese neural network-based fine-tuning method that adapts existing genomic models for disease-specific variant effect prediction.

Findings

01

DYNA outperforms baseline models on rare variant testing sets.

02

Fine-tuned models show improved accuracy in ClinVAR annotations.

03

Effective for both coding and non-coding variant effect predictions.

Abstract

Clinical variant classification of pathogenic versus benign genetic variants remains a challenge in clinical genetics. Recently, the proposition of genomic foundation models has improved the generic variant effect prediction (VEP) accuracy via weakly-supervised or unsupervised training. However, these VEPs are not disease-specific, limiting their adaptation at the point of care. To address this problem, we propose DYNA: Disease-specificity fine-tuning via a Siamese neural network broadly applicable to all genomic foundation models for more effective variant effect predictions in disease-specific contexts. We evaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus on various cardiovascular diseases, where gene-disease relationships of loss-of-function vs. gain-of-function dictate disease-specific VEP. For non-coding VEPs, we apply DYNA to an essential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies

MethodsSparse Evolutionary Training · Focus