# Machine Learning-Driven Prediction of Intensive Care Units Mortality and Length of Stay: A 11-Year Retrospective Study in Hong Kong Public Hospitals

**Authors:** Ying Zhao, Xincheng Shu, Chi-Sing Leung, Eric W. M. Wong, Qi Xuan, Kar-Lung Lee, Anne Leung, Lowell Ling, Hoi-Ping Shum, Wing-Lun Wan, Pauline Yeung Ng, Tsz-Kin Yim, Wai-Ming Tang, Kenny King-Chung Chan, Gavin Joynt

PMC · DOI: 10.1007/s10916-026-02355-8 · 2026-03-10

## TL;DR

This study uses machine learning to predict ICU mortality and length of stay in Hong Kong hospitals, outperforming traditional methods.

## Contribution

A novel machine learning pipeline is proposed that improves ICU mortality and LOS prediction compared to APACHE systems.

## Key findings

- CatBoost achieved the highest AUROC of 0.9070 and lowest Brier score for mortality prediction.
- The pipeline outperformed APACHE systems in predicting ICU mortality and LOS in Hong Kong hospitals.
- Age, GCS, and urine output were top features for mortality prediction, while creatinine and ventilator-measured respiratory rates were key for LOS.

## Abstract

This study aims to develop a machine learning (ML)-based pipeline to predict intensive care unit (ICU) mortality and length of stay (LOS). A dataset including 140,904 ICU admissions was collected from 15 public hospitals in Hong Kong over an 11-year period. The proposed pipeline deployed a suite of ML models to predict mortality and LOS. The performance of ML models was compared with the Acute Physiology and Chronic Health Evaluation (APACHE) systems on the collected dataset using five-fold cross-validation. Among all involved models, the Gradient Boosting with Categorical Features (CatBoost) achieved the highest area under the receiver operating characteristic curve (AUROC) of 0.9070 as well as the lowest Brier score of 0.0827 for mortality prediction and the lowest Mean Absolute Error (MAE) of 2.6364 for LOS prediction. The SHapley Additive exPlanations (SHAP) analysis conducted on CatBoost revealed that age, Glasgow Coma Scale (GCS) and urine output were the top-three important features for mortality prediction, whereas the top-three important features for LOS prediction were creatinine level, and the indicators for whether the lowest and highest respiratory rates were ventilator-measured. We further performed temporal validation and an in-depth analysis of CatBoost’s predictive performance across subsets grouped by age and hospital. Our results demonstrate that the proposed pipeline mitigates the overestimation of mortality predictions from APACHE systems in Hong Kong. Besides, the proposed predictive ML-based pipeline offers a transferable framework for researchers to develop models tailored to their local medical environments.

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** Coma (MESH:D003128), cardiovascular and respiratory diseases (MESH:D012140), Death (MESH:D003643), LOS (MESH:D007870), COVID-19 (MESH:D000086382)
- **Chemicals:** creatinine (MESH:D003404), Creat (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** S014067368392278X

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12975804/full.md

---
Source: https://tomesphere.com/paper/PMC12975804