# Enhancing COVID-19 Screening Models With Epidemiological and Mobility Features: Machine-Learning Model Study

**Authors:** Hyunwoo Choo, Dohyung Lee, Soo-Yong Shin, Jiwoo Lee, Duhun Lee, Eonji Kim, Namsoo Oh, Christina Kim, Myeongchan Kim, Hyo Jung Kim

PMC · DOI: 10.2196/54956 · 2026-03-05

## TL;DR

This study shows that adding mobility and epidemic data to machine learning models improves accuracy in predicting COVID-19 infections.

## Contribution

The novel use of mobility and epidemic data alongside patient symptoms enhances ML model performance for COVID-19 screening.

## Key findings

- Combining mobility and epidemic data with symptoms improved ML model performance for diagnosing COVID-19.
- The highest model accuracy increased from 0.8712 to 0.9104 with the inclusion of mobility and epidemic data.
- External contextual data significantly enhance the accuracy of ML-based screening models.

## Abstract

Despite the significant post–COVID-19 pandemic surge in research using symptom data and machine learning (ML) for patient screening, data on patient trajectories and epidemiological conditions, although crucial, have remained underused.

This study aimed to enhance the performance of ML models for COVID-19 screening by incorporating mobility and epidemic information in addition to patient symptom data.

Data, including daily self-reported symptoms, location information, and test results, were collected from 48,798 individuals using a smartphone app. These data were then combined with Our World in Data and national government epidemic information to train 5 ML-based screening models to classify patient infection status. The models were logistic regression, extreme gradient boosting, light gradient boosting machine, tabular data network, and Google AutoML.

The addition of mobility and epidemic data significantly improved the performance of all 5 models. The highest area under the receiver operating characteristic curve score increased from 0.8712 without mobility and epidemic data to 0.9104 with mobility and epidemic data. This highlights the considerable impact of external information on enhancing the performance of ML models.

This study demonstrated the potential of using mobility and epidemic data, such as location information and epidemic data, in combination with patient symptom data to improve the accuracy of ML models for diagnosing COVID-19. Considering additional contextual information can enhance the ability to screen for COVID-19.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Genes:** CMPK1 (cytidine/uridine monophosphate kinase 1) [NCBI Gene 51727] {aka CK, CMK, CMPK, UMK, UMP-CMPK, UMPK}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** infected (MESH:D007239), chills (MESH:D023341), COVID symptom (MESH:D000086382), cough (MESH:D003371), loss of smell (MESH:D000086582), Respiratory symptoms (MESH:D012818), Sore throat (MESH:D010612), deaths (MESH:D003643), respiratory illnesses (MESH:D012140), influenza (MESH:D007251), malaria (MESH:D008288), Zika (MESH:D000071243), fever (MESH:D005334), Ebola (MESH:D019142), infectious disease (MESH:D003141), respiratory (MESH:D012131), loss of taste (MESH:D000370)
- **Species:** Homo sapiens (human, species) [taxon 9606], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12978548/full.md

---
Source: https://tomesphere.com/paper/PMC12978548