AI-Enabled Diagnostic Prediction within Electronic Health Records to Enhance Biosurveillance and Early Outbreak Detection
Andre Goncalves, Jose Cadena, Yeping Hu, David Schlessinger, John Greene, Liam O’suilleabhain, Heather Clancy, Michael Vollmer, Vincent Liu, Tom Bates, Priyadip Ray

TL;DR
This paper introduces a machine learning method that improves early detection of infectious disease outbreaks by analyzing electronic health records.
Contribution
The novel contribution is integrating ML-based diagnostic predictions with traditional surveillance to enhance biosurveillance and outbreak detection.
Findings
33.3% of outbreaks were detected earlier with lead times of 1 to 24 days.
The system detected an average of 1.33 false positive outbreaks annually.
Combining ML predictions with traditional data improved biosurveillance effectiveness.
Abstract
Detecting infectious disease outbreaks promptly is crucial for effective public health responses, minimizing transmission, and enabling critical interventions. This study introduces a method that integrates machine learning (ML)-based diagnostic predictions with traditional epidemiological surveillance to enhance biosurveillance systems. Using 4.5 million patient records from 2010 to 2022, ML models were trained to predict, within 24-hour intervals, the likelihood of patients being diagnosed with infectious or unspecified gastrointestinal, respiratory, or neurological diseases. High-confidence predictions were combined with final diagnoses and analyzed using spatiotemporal outbreak detection techniques. Among diseases with five or more outbreaks between 2014 and 2022, 33.3% (41 of 123 outbreaks) were detected earlier, with lead times ranging from 1 to 24 days and an average of 1.33…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI
