# Genomic evolution of SARS-CoV-2 delta variants pre- and post-omicron emergence using alignment-free machine learning models

**Authors:** Sathish Sankar, Kaushika Anandharaman, Pradeesh Selvam, Aswini Jayaraman, Deepak Jayakumar, Pachamuthu Balakrishnan, Marie Larsson, Vijayakumar Velu, Sivadoss Raju, Esaki M. Shankar

PMC · DOI: 10.1371/journal.pone.0345259 · PLOS One · 2026-03-19

## TL;DR

This study uses machine learning to analyze how the SARS-CoV-2 Delta variant evolved before and after the Omicron variant emerged.

## Contribution

The study introduces an alignment-free machine learning framework to detect genomic changes in Delta variants.

## Key findings

- The random forest model achieved 93% accuracy in distinguishing pre- and post-Omicron Delta variants.
- 157 persistent and four vanished mutations were identified in the post-Omicron Delta group.
- Cluster analysis revealed evolving genomic patterns in Delta variants over time.

## Abstract

The SARS-CoV-2 Delta variant (B.1.617.2), initially classified as a variant of concern due to its enhanced transmissibility and vaccine-escape mutations, underwent further genomic changes following the emergence of the Omicron variant (B.1.1.529). This study investigates the genomic differences in Delta variant spike gene sequences collected before and after the emergence of Omicron. A total of 190 sequences were analyzed using an alignment-free approach incorporating k-mer-based feature extraction and machine learning models, including convolutional neural networks (CNN), K-means clustering, and random forest classification. The random forest model achieved 93% accuracy, with significant F1 scores, effectively distinguishing the two Delta variant groups. Comparative analysis revealed 157 persistent mutations and four vanished mutations in the post-Omicron group. Cluster analysis showed notable shifts, indicating stable yet evolving genomic patterns over time. The study demonstrates the advantage of alignment-free methods in detecting subtle sequence variations that alignment-based approaches may overlook. These findings enhance our understanding of SARS-CoV-2 evolution and provide a framework for identifying key genomic signatures relevant to public health. The methodology and insights gained offer potential applications in variant surveillance, vaccine design, and viral evolutionary studies, supporting preparedness for future SARS-CoV-2 variant emergence.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Genes:** CAT (catalase) [NCBI Gene 847], VTN (vitronectin) [NCBI Gene 7448] {aka V75, VN, VNT}, S (surface glycoprotein) [NCBI Gene 43740568] {aka spike glycoprotein}, ACE2 (angiotensin converting enzyme 2) [NCBI Gene 59272] {aka ACEH}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}
- **Diseases:** infectious (MESH:D003141), hepatitis C. (MESH:D019698), influenza A (MESH:D007251), COVID-19 (MESH:D000086382), hepatitis B (MESH:D006509), deaths (MESH:D003643)
- **Chemicals:** adenine (MESH:D000225), thymine (MESH:D013941), hydrogen (MESH:D006859), Gini (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Dengue virus group (clade) [taxon 11052], Human immunodeficiency virus 1 (no rank) [taxon 11676], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]
- **Mutations:** C643T, C2011G, C179T, T776A

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13001964/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13001964/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC13001964/full.md

---
Source: https://tomesphere.com/paper/PMC13001964