# Integrated disease model considering mutation-induced infection waves with COVID-19 cases

**Authors:** Seungho Baek, Haneol Cho, SangChul Lee, Myeongsu Yoo, Donghyok Kwon, KyuHwan Lee, Yeonju Kim, Chansoo Kim

PMC · DOI: 10.1371/journal.pone.0341667 · 2026-03-06

## TL;DR

The paper introduces a new model for tracking and predicting the spread of different COVID-19 variants using real-world data and mathematical techniques.

## Contribution

A novel integrated model that combines logistic curves for dominant variants and uses the PELT algorithm to determine when to sum them.

## Key findings

- The integrated model improves prediction accuracy compared to single-strain models using data from fourteen countries.
- The PELT algorithm identifies when variant dominance reaches 50%, allowing valid summation of logistic curves.
- The model supports iterative updates as new variants emerge, enhancing pandemic response.

## Abstract

COVID-19, an unprecedented global pandemic, has caused successive waves that pose unique challenges to public health and epidemiological research. Traditional Susceptible–Infected–Recovered (SIR) models often struggle to capture these complex dynamics, especially given that the virus has spawned multiple sub-variants. To tackle these challenges, we adopt an empirical modeling approach by integrating real-world data from Our World in Data (daily confirmed COVID-19 cases) and GISAID (variant prevalence) into a newly proposed integrated model. Specifically, we sum multiple sigmoidal (logistic) curves, each representing the cumulative infections of a distinct dominant variant, and recalibrate the model whenever a new variant emerges. By incorporating variant-specific parameters, our framework effectively captures the biological and epidemiological characteristics of COVID-19 in a dynamic, data-driven manner. Most importantly, we employ the Pruned Exact Linear Time (PELT) algorithm to provide rigorous mathematical justification for when separate variant models can be legitimately summed: specifically, when variant dominance reaches approximately 50%. This establishes the theoretical foundation that separate logistic models can be additively combined under specific dominance conditions. We evaluate our model using Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). Our empirical findings confirm that integrating dominant variants is associated with markedly improved accuracy compared to a single-strain approach, based on data from fourteen countries (including South Korea, the United States, and the United Kingdom) and global aggregates. We also provide theoretical motivation for approximating SIR dynamics via logistic functions and discuss how this integrated framework can be refined to enhance predictive performance. In doing so, we demonstrate the feasibility and advantages of iteratively updating epidemiological models in response to emerging variants, thereby offering actionable insights for ongoing and future pandemic management.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** infected (MESH:D007239), COVID-19 (MESH:D000086382), deaths (MESH:D003643), infectious disease (MESH:D003141), SARS (MESH:D045169), SIR (MESH:C562694)
- **Species:** Ebola virus (no rank) [taxon 1570291], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Figures

27 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12965675/full.md

---
Source: https://tomesphere.com/paper/PMC12965675