Integrating Large Language Models for Genetic Variant Classification
Youssef Boulaimen, Gabriele Fossi, Leila Outemzabet, Nathalie Jeanray,, Oleksandr Levenets, Stephane Gerart, Sebastien Vachenc, Salvatore Raieli,, Joanna Giemza

TL;DR
This paper explores the integration of advanced Large Language Models with genetic data to improve the classification of uncertain genetic variants, demonstrating significant accuracy improvements for clinical diagnostics.
Contribution
It introduces a comprehensive framework combining multiple LLMs and structural data, setting new benchmarks in genetic variant classification performance.
Findings
Models outperform existing tools on challenging variants
Significant accuracy improvements in ambiguous variant classification
Framework supports clinical deployment for personalized medicine
Abstract
The classification of genetic variants, particularly Variants of Uncertain Significance (VUS), poses a significant challenge in clinical genetics and precision medicine. Large Language Models (LLMs) have emerged as transformative tools in this realm. These models can uncover intricate patterns and predictive insights that traditional methods might miss, thus enhancing the predictive accuracy of genetic variant pathogenicity. This study investigates the integration of state-of-the-art LLMs, including GPN-MSA, ESM1b, and AlphaMissense, which leverage DNA and protein sequence data alongside structural insights to form a comprehensive analytical framework for variant classification. Our approach evaluates these integrated models using the well-annotated ProteinGym and ClinVar datasets, setting new benchmarks in classification performance. The models were rigorously tested on a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSparse Evolutionary Training
