# Utilising unsupervised machine learning to predict outbreaks of respiratory tract infections in acute Irish hospitals (2016-2021)

**Authors:** Doaa Amin, Akke Vellinga

PMC · DOI: 10.1016/j.puhip.2026.100748 · 2026-02-07

## TL;DR

This study uses unsupervised machine learning to predict respiratory tract infection outbreaks in Irish hospitals from 2016 to 2021.

## Contribution

The novel use of k-modes clustering to identify and predict RTI outbreaks in acute hospitals.

## Key findings

- Model 2 captured all RTI outbreaks using 212 diagnostic groups.
- Five diagnostic codes accounted for two-thirds of all RTI hospitalisations.
- Monitoring these codes could alert hospitals to potential outbreaks.

## Abstract

To apply unsupervised machine learning (ML) to predict outbreaks of respiratory tract infections (RTIs) in acute Irish hospitals (2016-2021).

A retrospective study.

RTIs data was extracted from Irish hospital inpatient enquiry (HIPE). Three k-modes clustering models were developed, whose resulting clusters were compared via graphical visualisation of main RTIs to choose the model which captured the outbreaks best. To understand the individual RTIs behind the outbreaks, further exploration was carried out.

Nearly half a million patients (491,099) were admitted to 55 acute Irish hospitals with an RTI. Model 2, including 212 diagnostic groups according to hierarchical clustering, was able to capture all outbreaks. Further analysis resulted in five diagnostic codes that contributed with two thirds of all RTI hospitalisations throughout the six years (acute lower RTI (28.24%), pneumonia (20.76%), chronic obstructive pulmonary disease with acute lower RTI (7.52%), COVID-19 (2020-2021) (5.13%), and acute upper RTI (4.37%)).

Unsupervised ML (K-modes clustering) can be useful in predicting RTIs outbreaks in acute Irish hospitals. Further analysis identified five RTI diagnostic codes that contributed most to outbreaks, which if monitored, may alert hospitals of potential RTI outbreaks.

## Linked entities

- **Diseases:** pneumonia (MONDO:0005249), chronic obstructive pulmonary disease (MONDO:0005002), COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** pneumonia (MESH:D011014), COPD (MESH:D029424), sequelae of respiratory (MESH:D012131), viral haemorrhagic fever (MESH:D006482), acute laryngitis (MESH:D000208), chronic lower respiratory diseases (MESH:D012140), Influenza (MESH:D007251), RTIs (MESH:D012141), Alzheimer's disease (MESH:D000544), Tuberculosis (MESH:D014376), dementias (MESH:D003704), heart failure (MESH:D006333), diphtheria (MESH:D004165), death (MESH:D003643), mycoplasma pneumoniae (MESH:D011019), HC (MESH:D003027), COVID-19 (MESH:D000086382), infected (MESH:D007239), bronchitis (MESH:D001991)
- **Species:** HC [taxon 11103], Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12925592/full.md

---
Source: https://tomesphere.com/paper/PMC12925592