# AI-Enabled Diagnostic Prediction within Electronic Health Records to Enhance Biosurveillance and Early Outbreak Detection

**Authors:** Andre Goncalves, Jose Cadena, Yeping Hu, David Schlessinger, John Greene, Liam O’suilleabhain, Heather Clancy, Michael Vollmer, Vincent Liu, Tom Bates, Priyadip Ray

PMC · DOI: 10.21203/rs.3.rs-6606632/v1 · 2025-06-12

## TL;DR

This paper introduces a machine learning method that improves early detection of infectious disease outbreaks by analyzing electronic health records.

## Contribution

The novel contribution is integrating ML-based diagnostic predictions with traditional surveillance to enhance biosurveillance and outbreak detection.

## Key findings

- 33.3% of outbreaks were detected earlier with lead times of 1 to 24 days.
- The system detected an average of 1.33 false positive outbreaks annually.
- Combining ML predictions with traditional data improved biosurveillance effectiveness.

## Abstract

Detecting infectious disease outbreaks promptly is crucial for effective public health responses, minimizing transmission, and enabling critical interventions. This study introduces a method that integrates machine learning (ML)-based diagnostic predictions with traditional epidemiological surveillance to enhance biosurveillance systems. Using 4.5 million patient records from 2010 to 2022, ML models were trained to predict, within 24-hour intervals, the likelihood of patients being diagnosed with infectious or unspecified gastrointestinal, respiratory, or neurological diseases. High-confidence predictions were combined with final diagnoses and analyzed using spatiotemporal outbreak detection techniques. Among diseases with five or more outbreaks between 2014 and 2022, 33.3% (41 of 123 outbreaks) were detected earlier, with lead times ranging from 1 to 24 days and an average of 1.33 false positive outbreaks detected annually. This approach demonstrates the potential of integrating ML with conventional methods for faster outbreak detection, provided adequate disease-specific training data is available.

## Full-text entities

- **Diseases:** gastrointestinal, respiratory, or neurological diseases (MESH:D012140), infectious disease (MESH:D003141)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12204487/full.md

---
Source: https://tomesphere.com/paper/PMC12204487