# Unveiling sub-populations in critical care settings: a real-world data approach in COVID-19

**Authors:** Wesley Anderson, Ruth Gould, Namrata Patil, Nicholas Mohr, Kenneth Dodd, Danielle Boyce, Pam Dasher, Philippe J. Guerin, Reham Khan, Sreekanth Cheruku, Vishakha K. Kumar, Ewy Mathé, Aneesh K. Mehta, Andrew P. Michelson, Andrew Williams, Smith F. Heavner, Jagdeep T. Podichetty

PMC · DOI: 10.3389/fpubh.2025.1544904 · Frontiers in Public Health · 2025-05-15

## TL;DR

This study uses real-world data to identify distinct groups of COVID-19 patients with different clinical features and outcomes, aiming to improve personalized treatment strategies.

## Contribution

The novel use of FAMD-based clustering on real-world EHR data to identify clinically distinct subgroups of COVID-19 patients.

## Key findings

- Three distinct clusters of COVID-19 patients with unique clinical characteristics were identified.
- Hospital stay durations and survival rates varied significantly among the clusters.
- Machine learning models accurately classified the identified subgroups.

## Abstract

Disease presentation and progression can vary greatly in heterogeneous diseases, such as COVID-19, with variability in patient outcomes, even within the hospital setting. This variability underscores the need for tailored treatment approaches based on distinct clinical subgroups.

This study aimed to identify COVID-19 patient subgroups with unique clinical characteristics using real-world data (RWD) from electronic health records (EHRs) to inform individualized treatment plans.

A Factor Analysis of Mixed Data (FAMD)-based agglomerative hierarchical clustering approach was employed to analyze the real-world data, enabling the identification of distinct patient subgroups. Statistical tests evaluated cluster differences, and machine learning models classified the identified subgroups.

Three clusters of COVID-19 in patients with unique clinical characteristics were identified. The analysis revealed significant differences in hospital stay durations and survival rates among the clusters, with more severe clinical features correlating with worse prognoses and machine learning classifiers achieving high accuracy in subgroup identification.

By leveraging RWD and advanced clustering techniques, the study provides insights into the heterogeneity of COVID-19 presentations. The findings support the development of classification models that can inform more individualized and effective treatment plans, improving patient outcomes in the future.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12119499/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12119499/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC12119499/full.md

---
Source: https://tomesphere.com/paper/PMC12119499