# Prediction-guided clustering for sepsis phenotyping: a retrospective cohort analysis

**Authors:** Paul A. Hilders, Lada Lijović, Martijn Otten, Laurens A. Biesheuvel, Floor Hiemstra, Marcel van der Kuil, Ameet R. Jagesar, P. J. Thoral, Ari Ercole, Paul W. G. Elbers

PMC · DOI: 10.1186/s40635-026-00882-9 · 2026-03-18

## TL;DR

This study introduces a machine learning method to identify distinct sub-phenotypes of sepsis patients, which could lead to better personalized treatment strategies.

## Contribution

A novel prediction-guided clustering approach that integrates deep learning with clinical outcomes to identify interpretable sepsis sub-phenotypes.

## Key findings

- Six distinct sepsis sub-phenotypes with varying risk profiles and clinical presentations were identified.
- The sub-phenotypes showed robust generalizability across different ICU datasets.
- Reinforcement learning revealed different optimal treatment strategies for each sub-phenotype.

## Abstract

Sepsis is a major cause of morbidity and mortality worldwide, with its heterogeneous and dynamically evolving clinical presentation complicating diagnosis, treatment, and prognosis. The identification of clinically meaningful sub-phenotypes within the sepsis population could help tailor interventions and improve outcomes. However, existing phenotyping studies have yielded inconsistent results with limited clinical utility. In this study, we propose a novel, guided machine-learning approach to identify clinically relevant sub-phenotypes within the sepsis condition by integrating deep representation learning with prediction-guided clustering to capture temporal disease trajectories.

We trained a recurrent neural network-based encoder to generate compact, predictive representations of sepsis patients over time. During training, the encoder is guided by four auxiliary prediction objectives (i.e., 90-day mortality, remaining length of stay, need for mechanical ventilation, and need for renal replacement therapy), which encourage the model to create representations that are relevant with respect to patient-centred outcomes. After training, patient representations were clustered using the K-means algorithm. The identified sub-phenotypes were compared across two large ICU data sets (AmsterdamUMCdb and MIMIC-IV) and interpreted using Integrated Gradients-based attribution maps. Practical and clinical utility of the phenotypes was evaluated using a reinforcement learning framework to evaluate optimal treatment strategies within each sepsis sub-phenotype.

Through our approach, we identified six clinically distinct sub-phenotypes with varying risk profiles and presentations. The learned representations demonstrated robust generalisability across the different data sets, and the reinforcement learning results indicated that the different sub-phenotypes were associated with different optimal treatment strategies, highlighting the potential for phenotype-informed decision-making.

This study introduces a flexible and effective framework for the identification of robust and clinically meaningful sub-phenotypes within the population of sepsis patients. Moreover, the identified sub-phenotypes are clinically interpretable, and the proposed trajectory-aware phenotyping approach may support the future development of personalised and precision medicine strategies.

The online version contains supplementary material available at 10.1186/s40635-026-00882-9.

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, SLC17A5 (solute carrier family 17 member 5) [NCBI Gene 26503] {aka AST, ISSD, NSD, SD, SIALIN, SIASD}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}, LPA (lipoprotein(a)) [NCBI Gene 4018] {aka AK38, APOA, LP}
- **Diseases:** extra-renal dysfunction (MESH:D007674), death (MESH:D003643), LSTM (MESH:D000088562), Sepsis (MESH:D018805), metabolic derangement (MESH:D008659), infection (MESH:D007239), circulatory shock (MESH:D012769), AKI (MESH:D058186), organ dysfunction (MESH:D009102), thrombocytopenia (MESH:D013921), kidney failure (MESH:D051437), inflammatory (MESH:D007249), MV (MESH:D053717), lung failure (MESH:D012131), coma (MESH:D003128), MIMIC-IV (MESH:C000657744), renal or hepatic failure (MESH:D017093), chronic critical illness (MESH:D016638)
- **Chemicals:** lactate (MESH:D019344), carbon (MESH:D002244), Oxygen (MESH:D010100), bilirubin (MESH:D001663), chloride (MESH:D002712), PaCO2 (-), creatinine (MESH:D003404), calcium (MESH:D002118), sodium (MESH:D012964), glucose (MESH:D005947), norepinephrine (MESH:D009638), potassium (MESH:D011188), carbon dioxide (MESH:D002245), urea nitrogen (MESH:C530477), bicarbonate (MESH:D001639)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13000074/full.md

---
Source: https://tomesphere.com/paper/PMC13000074