# Ensemble-labeling of infectious disease time series to evaluate early warning systems

**Authors:** Andreas Hicketier, Moritz Bach, Philip Oedi, Alexander Ullrich, Auss Abbood

PMC · DOI: 10.1016/j.idm.2025.12.013 · 2025-12-23

## TL;DR

The paper introduces a new method to label disease outbreak data, enabling better evaluation and training of early warning systems for infectious diseases like COVID-19.

## Contribution

An adaptive ensemble labeling method for heterogeneous disease time series that improves benchmarking and supervised model training.

## Key findings

- The method consistently produces useful outbreak labels for various outbreak types and spatial resolutions.
- Supervised models trained with generated labels outperform traditional unsupervised outbreak detection methods.
- The approach allows systematic benchmarking of outbreak detection systems on real surveillance data.

## Abstract

Early warning systems (EWSs) for detecting disease outbreaks can help make informed public health decisions and organize necessary responses. During the COVID-19 pandemic, several EWSs were proposed that use covariates such as mobility or social media data for improved timeliness and precision. Evaluating these EWSs is not trivial, since we do not have the ground truth knowledge about outbreaks of COVID-19. Workarounds for missing labels are to simulate them or produce them post hoc. Simulating COVID-19 outbreaks for evaluation is not feasible with highly complex covariates such as mobility. Furthermore, existing post hoc labeling methods do not perform well on heterogeneous COVID-19 time series. To address this evaluation gap, we propose an adaptive labeling method that produces useful labels (time-indexed annotations marking outbreak-like periods) for highly heterogeneous, nonstationary COVID-19 time series. To this end, we develop a customized ensemble of labeling methods. We find that our method consistently produces useful labels for various outbreak types, such as waves and short peaks occurring at different spatial resolutions. Lastly, we use our self-produced labels to train machine learning models and compare their performance with traditional outbreak detection methods. We find that models trained with our labels outperform classical, unsupervised outbreak detection algorithms.

Image 1

•New adaptive labeling method for disease time series to retrospectively identify both outbreaks and waves.•Generated labels allow for systematic benchmarking of outbreak detection methods on real surveillance data at different spatial resolutions, from regional to national data.•Generated labels enable training of supervised machine learning models that outperform classical unsupervised statistical outbreak detection methods.

New adaptive labeling method for disease time series to retrospectively identify both outbreaks and waves.

Generated labels allow for systematic benchmarking of outbreak detection methods on real surveillance data at different spatial resolutions, from regional to national data.

Generated labels enable training of supervised machine learning models that outperform classical unsupervised statistical outbreak detection methods.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** infectious disease (MESH:D003141), COVID-19 (MESH:D000086382)

## Figures

21 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12905992/full.md

---
Source: https://tomesphere.com/paper/PMC12905992