# Predicting genetic evolution of viruses to identify suitable vaccines using artificial intelligence

**Authors:** Osama R. Shahin, Mohamed N. Ibrahim, Awadh Alanazi, Fahd S. Alharithi, Yasir Alruwaili, Ahmad A. Alzahrani, Eman Fawzy El Azab

PMC · DOI: 10.1038/s41598-026-35143-y · Scientific Reports · 2026-02-03

## TL;DR

This paper introduces an AI framework to predict viral evolution and improve vaccine development by analyzing genomic and structural data.

## Contribution

The novel R-DELF framework combines genomic, structural, and temporal intelligence to predict viral mutations and vaccine suitability more accurately than existing models.

## Key findings

- R-DELF achieves 99.2% accuracy and 99.4% F1 score in predicting viral mutations.
- The framework outperforms current AI-based virology models in precision and recall.
- It enables proactive vaccine development by predicting high-risk mutations in advance.

## Abstract

The evolution of the viruses is rapidly becoming a global challenge to the creation of vaccines since the new variants are often capable of escaping the immune system and decreasing the vaccine efficacy. The traditional methods of genomic epidemiology rely on the retrospective phylogenetic analysis, which can elucidate the previous mutations, but cannot predict the evolutionary trends in the future. In order to address these disadvantages, a new Refined Deep Evolutionary Learning Framework (R-DELF) is proposed that combines the genomic, structural, and temporal intelligence in predicting proactive viral mutations and assessing vaccine suitability. The methodology uses an ESM-2 Transformer that extracts structure-aware embeddings, merged with dual-attention Graph Neural Networks (GNNs) which learn phylogenetic and structural dependencies. Evolutionary learning maximiser improves adaptation modelling and an Explainable AI layer, which offers interpretability based on residue-level attribution. Tests indicate that experimentally it achieves 99.2% accuracy, 97.92% precision, 98.89% recall and 99.4% F1, which is higher than the current AI-based virology models. It is implemented in Python and with the help of TensorFlow and genomic and protein data obtained via Kaggle. The framework allows predicting the high-risk mutations in advance, facilitates the production of vaccines on time, and increases the preparedness to pandemics by making intelligent, data-driven predictions of viral evolution.

## Full-text entities

- **Genes:** COIL (coilin) [NCBI Gene 8161] {aka CLN80, p80-coilin}, TTC41P (tetratricopeptide repeat domain 41, pseudogene) [NCBI Gene 253724] {aka GNN, GNNP}, S (surface glycoprotein) [NCBI Gene 43740568] {aka spike glycoprotein}, M (membrane glycoprotein) [NCBI Gene 43740571]
- **Diseases:** XAI (MESH:C538243), AI (MESH:C538142), COVID-19 (MESH:D000086382), influenza (MESH:D007251)
- **Chemicals:** amino acid (MESH:D000596), VOCs (-)
- **Species:** Human immunodeficiency virus 1 (no rank) [taxon 11676], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Homo sapiens (human, species) [taxon 9606], Orthomyxoviridae (family) [taxon 11308]
- **Cell lines:** ESM-2 — Homo sapiens (Human), Transformed cell line (CVCL_XI05)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12887023/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12887023/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12887023/full.md

---
Source: https://tomesphere.com/paper/PMC12887023