# Accounting for reporting delays in real-time phylodynamic analyses with preferential sampling

**Authors:** Catalina M. Medina, Julia A. Palacios, Volodymyr M. Minin

PMC · DOI: 10.1371/journal.pcbi.1012970 · PLOS Computational Biology · 2025-05-06

## TL;DR

This paper introduces a new method to improve real-time tracking of virus spread by accounting for delays in genetic data reporting.

## Contribution

A novel model that incorporates reporting delays to improve real-time phylodynamic inference of infection trends.

## Key findings

- The proposed method outperforms existing approaches in estimating effective population size with delayed data.
- Incorporating reporting delay information reduces bias in real-time phylodynamic analyses.
- The model was validated using simulated data and real SARS-CoV-2 sequences from Washington state.

## Abstract

The COVID-19 pandemic demonstrated that fast and accurate analysis of continually collected infectious disease surveillance data is crucial for situational awareness and policy making. Coalescent-based phylodynamic analysis can use genetic sequences of a pathogen to estimate changes in its effective population size, a measure of genetic diversity. These changes in effective population size can be connected to the changes in the number of infections in the population of interest under certain conditions. Phylodynamics is an important set of tools because its methods are often resilient to the ascertainment biases present in traditional surveillance data (e.g., preferentially testing symptomatic individuals). Unfortunately, it takes weeks or months to sequence and deposit the sampled pathogen genetic sequences into a database, making them available for such analyses. These reporting delays severely decrease precision of phylodynamic methods closer to present time, and for some models can lead to extreme biases. Here we present a method that affords reliable estimation of the effective population size trajectory closer to the time of data collection, allowing for policy decisions to be based on more recent data. Our work uses readily available historic times between sampling and reporting of sequenced samples for a population of interest, and incorporates this information into the sampling model to mitigate the effects of reporting delay in real-time analyses. We illustrate our methodology on simulated data and on SARS-CoV-2 sequences collected in the state of Washington in 2021.

Estimating the number of individuals infected by a given virus is key for informing dynamic health policy, but it is also a nontrivial task. Reported case data often suffers from sampling biases, preventing accurate inference for a population of interest. Pathogen genetic data provide an alternative data source that can be used in phylodynamic analyses that are more robust to sampling biases. Unfortunately, the time between when a sample is collected and when it is sequenced and available for analysis, which we refer to as the reporting delay, results in unobserved samples near present time for real-time analyses. Missing data can be particularly problematic in methods that model the relationship between the number of samples collected over time and the number of infections. Specifically, the concern for those models is that fewer reported samples near present time would result in lower estimates of the true disease prevalence. We propose a model that incorporates information about recent reporting delays to account for missing samples near present time due to having not been reported by the time of analysis. Using simulated data and SARS-CoV-2 sequences from the state of Washington in 2021, we show that our new method ourperforms state-of-the-art methods.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Diseases:** infections (MESH:D007239), COVID-19 (MESH:D000086382), infectious disease (MESH:D003141)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12101774/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12101774/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12101774/full.md

---
Source: https://tomesphere.com/paper/PMC12101774