# Real-time forecasting of data revisions in epidemic surveillance streams

**Authors:** Jingjing Tang, Aaron Rumack, Bryan Wilder, Roni Rosenfeld, Roger Dimitri Kouyos, Philipp Martin Altrock, Roger Dimitri Kouyos, Philipp Martin Altrock, Roger Dimitri Kouyos, Philipp Martin Altrock

PMC · DOI: 10.1371/journal.pcbi.1013709 · 2025-11-20

## TL;DR

This paper introduces Delphi-RF, a fast and accurate method for forecasting data revisions in real-time epidemic surveillance.

## Contribution

Delphi-RF uses nonparametric quantile regression to model data revisions, improving accuracy and computational efficiency for public health monitoring.

## Key findings

- Delphi-RF provides accurate forecasts of finalized surveillance values for early-stage epidemic data.
- The method improves computational efficiency by 10-100x compared to existing approaches.
- It works well for both count and proportion data in various disease surveillance contexts.

## Abstract

Epidemic data streams undergo frequent revisions due to reporting delays (“backfill”) and other factors. Relying on tentative surveillance values can seriously degrade the quality of situational awareness, forecasting accuracy and decision-making. We introduce Delphi Revision Forecast (Delphi-RF), a real-time data revision forecasting framework using nonparametric quantile regression, applicable to both counts and proportions (fractions) in public health reporting. By incorporating all available revisions up to a given estimation date, Delphi-RF models revision dynamics and generates distributional forecasts of finalized surveillance values. Applied to daily COVID-19 data (insurance claims, antigen tests, confirmed cases) and weekly dengue and influenza-like illness (ILI) case counts, Delphi-RF delivers accurate revision forecasts, particularly in early reporting stages. In addition, it improves computational efficiency by more than 10-100x compared to existing methods, making it a scalable solution for real-time public health surveillance.

Accurate and reliable forecasts of infectious disease epidemics, such as COVID-19, are essential but challenging. The presence of data revisions in public health data streams can introduce significant biases in both predictors and responses, leading to suboptimal situational awareness, preparedness, and downstream countermeasure design. To address this issue, we propose a modeling framework that leverages historical revision patterns to generate distributional forecasts of finalized surveillance values. Applicable to both count-type and fraction-type data across various temporal resolutions and epidemic surveillance data streams, our approach ensures real-time accuracy, even with only early revisions available. Moreover, our method achieves competitive or superior forecast accuracy compared to existing methods, while also demonstrating a more than 10-100x improvement in computational efficiency.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096), dengue (MONDO:0005502)

## Full-text entities

- **Diseases:** ILI (MESH:D007251), COVID-19 (MESH:D000086382), dengue (MESH:D003715)

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12646461/full.md

---
Source: https://tomesphere.com/paper/PMC12646461