# Predicting Infectious Disease Incidence After Flooding Using Artificial Intelligence Models: A Retrospective Pre–Post Cohort Study Using Routinely Collected EHR Data

**Authors:** Mehdi Safari, Alireza Zali, Hossein Hatami, Elnaz Amanzadeh Jajin, Meisam Akhlaghdoust

PMC · DOI: 10.1002/hsr2.71470 · Health Science Reports · 2025-11-06

## TL;DR

This study uses AI models to predict infectious disease rates after floods by analyzing electronic health records, finding that disease incidence rises significantly post-flood.

## Contribution

The study introduces AI models for predicting post-flood infectious disease incidence using real-world health records, highlighting feasibility and limitations.

## Key findings

- Infectious disease prevalence increased significantly after flooding, with an odds ratio of 1.38.
- Random Forest achieved the highest predictive performance (AUC = 0.76) among tested machine learning models.
- Age and visit date were the most important predictive features, with younger patients showing higher post-flood disease rates.

## Abstract

Natural disasters, particularly floods, significantly increase infectious disease risk through environmental contamination and healthcare system disruption. Despite well‐documented flood‐disease associations, predictive models for post‐disaster epidemiological surveillance remain limited. We aimed to develop and validate machine learning algorithms to predict infectious disease incidence following flood events.

We conducted a retrospective pre–post cohort study using routinely collected electronic health records from Firuzkuh County health centers, comparing a 30‐day pre‐flood cohort (July–August 2021; n = 461) with a 30‐day post‐flood cohort (July–August 2022; n = 478). Five classifiers (Random Forest, Logistic Regression, linear SVM, Gradient Boosting, and ANN) were trained and evaluated on a held‐out test set using AUC.

Post‐flood infectious disease prevalence increased significantly from 39.5% to 47.3% (p < 0.001), with an odds ratio of 1.38 (95% CI: 1.09–1.75) and attributable risk of 7.8 percentage points. Among machine learning models, Random Forest achieved the highest predictive performance (AUC = 0.76), followed by Gradient Boosting (0.74), Artificial Neural Network (0.72), Support Vector Machine (0.71), and Logistic Regression (0.69). Age and visit date emerged as the most important predictive features across all models. Unexpectedly, younger patients (mean age 51.0 years) showed higher post‐flood infectious disease rates compared to older patients (mean age 57.9 years) in the pre‐flood period.

Machine learning models demonstrated moderate predictive performance for post‐flood infectious disease occurrence. While results show feasibility for AI‐based disaster epidemiology, the modest performance indicates that incorporating additional environmental and socioeconomic variables is essential for developing clinically actionable prediction systems for public health emergency response.

## Linked entities

- **Diseases:** infectious disease (MONDO:0005550)

## Full-text entities

- **Diseases:** flood (MESH:C565009), Infectious Disease (MESH:D003141)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12592683/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12592683/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12592683/full.md

---
Source: https://tomesphere.com/paper/PMC12592683