# Stochastic mutation as a mechanism for the emergence of SARS-CoV-2 new variants

**Authors:** Liaofu Luo, Jun Lv

PMC · DOI: 10.1016/j.virusres.2025.199667 · Virus Research · 2025-11-20

## TL;DR

This study shows that random mutations in the spike protein of SARS-CoV-2 can explain the emergence of new virus variants and their evolutionary patterns.

## Contribution

A stochastic mutation model is introduced to predict the emergence of new SARS-CoV-2 macro-lineages based on spike protein mutations.

## Key findings

- Stochastic mutation in spike protein sites drives the emergence of new SARS-CoV-2 variants.
- Threshold values of mutation sites determine transitions between macro-lineages like O, N, P, and Q.
- Large-scale stochastic sampling reveals statistical patterns in the evolution of SARS-CoV-2.

## Abstract

•Constructed a cladogram elucidating evolutionary relationships among mutants.•Deduced principles governing the emergence of novel macro-lineages.•Demonstrated stochastic mutation drives new SARS-CoV-2 variants emergence.

Constructed a cladogram elucidating evolutionary relationships among mutants.

Deduced principles governing the emergence of novel macro-lineages.

Demonstrated stochastic mutation drives new SARS-CoV-2 variants emergence.

Predicting the future evolutionary trajectory of SARS-CoV-2 remains a critical challenge, particularly due to the pivotal role of spike protein mutations. It is therefore essential to develop evolutionary models capable of continuously integrating new experimental data. In this study, we employ a cladogram algorithm that incorporates established assumptions for mutant representation — using both four-letter and two-letter formats — along with an n-mer distance algorithm to construct a cladogenetic tree of SARS-CoV-2 mutations. This tree accurately captures the observed changes across macro-lineages. We introduce a stochastic method for generating new strains on this tree based on spike protein mutations. For a given set A of existing mutation sites, we define a set X comprising x randomly generated mutation sites on the spike protein. The intersection of A and X, denoted as set Y, contains y sites. Our analysis indicates that the position of a generated strain on the tree is primarily determined by x. Through large-scale stochastic sampling, we predict the emergence of new macro-lineages. As x increases, the dominance among macro-lineages shifts: lineage O surpasses N, P surpasses O, and eventually Q surpasses P. We identify threshold values of x that delineate transitions between these macro-lineages. Furthermore, we propose an algorithm for predicting the timeline of macro-lineage emergence. In conclusion, our findings demonstrate that SARS-CoV-2 evolution adheres to statistical principles: the emergence of new strains can be driven by randomly generated spike protein sites, and large-scale stochastic sampling reveals evolutionary patterns underlying the rise of distinct macro-lineages.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12789824/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12789824/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12789824/full.md

---
Source: https://tomesphere.com/paper/PMC12789824