# Predictive Models for Early Infection Detection in Nursing Home Residents: Evaluation of Imputation Techniques and Complementary Data Sources

**Authors:** Melisa Granda, María Santamera-Lastras, Alberto Garcés-Jiménez, Francisco Javier Bueno-Guillén, Diego María Rodríguez-Puyol, José Manuel Gómez-Pulido

PMC · DOI: 10.3390/healthcare14020166 · 2026-01-08

## TL;DR

This study shows that combining physiological data with social media and air pollution data improves early infection detection in elderly nursing home residents.

## Contribution

The novel integration of social media and air pollution data with physiological measurements enhances early infection prediction in nursing homes.

## Key findings

- Social media integration provides a 6-day lead time for infection prediction with an F1-score of 0.97.
- Air pollution data improves immediate infection detection accuracy.
- Multiclass models using external data achieve over 90% sensitivity for specific infections.

## Abstract

Background: Aging in Western societies poses a growing challenge, placing increasing pressure on healthcare costs. Early identification of infections in elderly nursing home residents is crucial to reduce complications, mortality, and the burden on emergency departments. Methods: We performed a comparative analysis of machine learning models using XGBoost classifiers for infection detection, addressing incomplete daily physiological measurements (Heart Rate, Oxygen Saturation, Body Temperature, and Electrodermal Activity) through strict imputation protocols. We evaluated three model variants—Basic (clinical only), Air Pollution-added, and Social Media-integrated—while incorporating a novel Basal Module to personalize physiological baselines for each resident. Results: Results from the binary model indicate that physiological data provides a necessary baseline for immediate screening. Notably, social media integration emerged as a powerful forecasting tool, extending the predictive horizon to a 6-day lead time with an F1-score of 0.97. Complementarily, air pollution data ensured robust immediate detection (“nowcasting”). In the multiclass scenario, external data resolved the “semantic gap” of vital signs, improving sensitivity for specific infections (e.g., acute respiratory and urinary tract infections) to over 90%. Conclusions: These findings highlight that the strategic integration of environmental and digital signals transforms the system from a reactive monitor into a proactive early warning tool for long-term care facilities.

## Full-text entities

- **Diseases:** Infection (MESH:D007239), respiratory and urinary tract infections (MESH:D012141)
- **Chemicals:** Oxygen (MESH:D010100)

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12840934/full.md

---
Source: https://tomesphere.com/paper/PMC12840934