# Integrating genomic epidemiology and deep mutational scanning data for prevalence forecasting of SARS-CoV-2 Omicron lineages

**Authors:** Zhong-yi Lei, Xiao-min Zhang, Jia-lu Han, Ji-guo Xue, Jia-yi Xu, Zi-lin Ren, Yi-gang Tong, Xiao-chen Bo, Ming Ni

PMC · DOI: 10.1371/journal.pone.0335520 · 2025-11-03

## TL;DR

This paper introduces CoVPF, a model that combines genomic data and mutation effects to better predict the spread of SARS-CoV-2 Omicron variants.

## Contribution

The novel integration of genomic epidemiology and deep mutational scanning data, with emphasis on epistasis, improves lineage prevalence forecasting.

## Key findings

- CoVPF achieved 20.7% higher accuracy in predicting lineage prevalence compared to previous models.
- Ignoring epistasis reduced forecasting accuracy by 43%, highlighting its importance.
- CoVPF provided more accurate and timely forecasts for lineage expansions like EG.5.1 and XBB.1.5.

## Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continuously circulates and the Omicron variants have mutated into over 2,500 lineages, predicting ensuing prevalent lineages and inflections of dominant lineages is of public health significance and study interest. Previous study has integrated genome to forecast lineage prevalence, yet overlooked the functional aspects of mutations; efforts to evaluate the functional effects of individual mutations have not extended to the lineage level. Here, we propose CoVPF, a model integrating both genomic epidemiology and deep mutational scanning (DMS) data for the receptor binding domain (RBD) of SARS-CoV-2 spike protein, to predict the prevalence of Omicron lineages. Retrospective validation demonstrated that CoVPF achieved 20.7% higher accuracy compared to previous study. Furthermore, we found that accounting for epistasis was critical, as ignoring epistasis led to a 43% decrease in forecasting accuracy. Case studies showed that CoVPF delivered more accurate and timely forecasts for lineage expansions and inflections such as EG.5.1 and XBB.1.5. CoVPF provides a paradigm for integrating in vitro functional readouts of the virus and accounting for combinatorial effects of mutations in support of public health efforts in lineage prevalence forecasting.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Genes:** S (surface glycoprotein) [NCBI Gene 43740568] {aka spike glycoprotein}
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12582474/full.md

---
Source: https://tomesphere.com/paper/PMC12582474