# Acute respiratory infection (COVID-19) risk prediction in travelers: A random forest model

**Authors:** Jingbo Yu, Hao Yu, Yuming Wang, Qiang Zeng

PMC · DOI: 10.1016/j.idm.2025.12.021 · Infectious Disease Modelling · 2026-02-26

## TL;DR

This paper develops a random forest model to predict the risk of COVID-19 infection in travelers based on their travel history and other factors.

## Contribution

The study introduces a random forest-based prediction model that outperforms logistic regression for identifying infected travelers.

## Key findings

- The random forest model showed better discriminative ability and calibration than logistic regression.
- Travel history-derived factors like close contacts, flight risk, and sojourn risk were top predictors.
- Infection prevalence was significantly higher in high-risk groups compared to low-risk groups.

## Abstract

Early screening during outbreaks of acute respiratory infections (ARIs) is critical for controlling disease spread among international travelers. However, the massive volume of traveler data generated in a short timeframe makes manual screening of suspected cases impractical for health quarantine officers. Prediction models for infection offer a promising solution to this challenge.

Key predictive variables including travel history and seat numbers were extracted from passenger itineraries to construct the risk assessment model. Random forest algorithm and multivariate logistic regression were used to build prediction models of COVID-19 infection separately. Compare their performance through sensitivity(recall for the positive class), specificity, accuracy, AUC and Brier score. Sort the importance of variables through random forest algorithm.

The random forest model exhibited better discriminative ability and calibration. Variable importance analysis revealed travel history-derived factors as top predictors: close contacts(0.419), flight risk (0.286), and sojourn risk (0.182). Infection prevalence stratified by risk level: flight risk: low risk vs high risk: 0.7% vs 1.4%; sojourn risk: low risk vs high risk: 0.7% vs 2.0%; close contacts vs non-close contact: 0.3% vs 2.4%.

The prediction model based on random forest algorithm has a better performance in identifying infected passengers than multivariate regression model. We should pay more attention on variables extracted by epidemiological history in building prediction model of respiratory infectious diseases. This model demonstrates strong potential for effectively responding to future outbreaks of acute infectious diseases such as COVID-19.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** infectious disease (MESH:D003141), SARS (MESH:D045169), died (MESH:D003643), influenza (MESH:D007251), MERS (MESH:D018352), ARIs (MESH:D012141), Symptom (MESH:D012816), Infection (MESH:D007239), COVID-19 (MESH:D000086382)
- **Chemicals:** acid (MESH:D000143)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12969110/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12969110/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12969110/full.md

---
Source: https://tomesphere.com/paper/PMC12969110