# Assessing in-hospital mortality risk in ICU lung cancer patients using machine learning: An analysis based on the MIMIC-IV database

**Authors:** Jianwei Wang, Lizhen Lin, Li-ping Qiu, Li-lan Zheng, Lu-xi Wu, Hui Lv, Haihua Xie

PMC · DOI: 10.1371/journal.pone.0341259 · PLOS One · 2026-01-22

## TL;DR

This study uses machine learning to predict in-hospital mortality risk for ICU lung cancer patients, identifying key factors like hospital stay duration and SAPS II score.

## Contribution

The novel contribution is the development of an interpretable machine learning model using MIMIC-IV data to predict mortality in ICU lung cancer patients.

## Key findings

- The XGBoost model achieved an AUC of 0.865 in the training cohort and 0.790 in the test cohort for predicting in-hospital mortality.
- Hospital stay duration and SAPS II score were identified as the most influential predictors across multiple models.
- SHAP analysis enhanced model interpretability, showing the direction and magnitude of key predictors.

## Abstract

Patients with advanced lung cancer admitted to the intensive care unit (ICU) face a substantially elevated risk of in-hospital mortality. Early identification of high-risk individuals is essential to support timely clinical decision-making. This study aimed to develop and validate a predictive model using machine learning (ML) techniques to estimate in-hospital mortality in this patient population.

Clinical data were obtained from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression, enabling the construction of eight ML models: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), artificial neural network (ANN), extreme gradient boosting (XGBoost), k-nearest neighbors (k-NN), adaptive boosting (AdaBoost), and random forest (RF). Model performance was assessed using the area under the receiver operating characteristic curve (AUC), as well as accuracy, sensitivity, specificity, and F1 score. Discrimination, calibration, and clinical utility were also evaluated. The final model incorporated 27 clinically interpretable variables, including not only established severity scores (e.g., SAPS II) but also dynamic treatment factors (e.g., vasopressin, mechanical ventilation duration) that reflect real-world ICU practice. SHAP analysis was employed to enhance interpretability, allowing clinicians to understand both the magnitude and directionality of key predictors—an improvement over black-box ML applications in prior studies.

Among the 1,755 patients included, 368 (21%) died during hospitalization in the training cohort.Notably, older individuals, particularly those of Caucasian descent, demonstrated a higher susceptibility to mortality during their hospital stay. Lasso regression revealed that 27 variables demonstrated a significant correlation with lung cancer, such as gender, hospital stay duration The XGBoost model achieved the highest predictive performance, achieving an accuracy of 0.783, an F1 score of 0.595, and an AUC of 0.865 (95% CI: 0.840–0.891)within the training cohort. The performance metrics for the test cohort reflected similar trends, with an accuracy of 0.719, an F1 score of 0.543, and an AUC of 0.790(95% CI: 0.741–0.840). Key predictors identified consistently across models (LR, SVM, ANN, and XGBoost) included hospital stay duration, Simplified Acute Physiology Score II (SAPS II), use of norepinephrine and vasopressin, prothrombin time (PT), mechanical ventilation duration, white blood cell count (WBC), and blood urea nitrogen (BUN). The SHAP summary plot further illustrated the direction and magnitude of influence for the top 15 predictors.

The XGBoost-based model showed the best performance in predicting in-hospital mortality among critically ill lung cancer patients. Hospital stay duration and SAPS II score emerged as the most influential predictors,which can serve as the basis for a simplified clinical risk score. These findings may support early risk stratification and guide clinical decision-making in the ICU. The analysis, relying exclusively on internal divisions from MIMIC-IV, restricts the model’s generalizability and, consequently, its applicability in broader clinical contexts.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Genes:** ALPP (alkaline phosphatase, placental) [NCBI Gene 250] {aka ALP, PALP, PLAP, PLAP-1}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, SKAP2 (src kinase associated phosphoprotein 2) [NCBI Gene 8935] {aka PRAP, RA70, SAPS, SCAP2, SKAP-HOM, SKAP55R}, GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}, PDX1 (pancreatic and duodenal homeobox 1) [NCBI Gene 3651] {aka GSF, IDX-1, IPF1, IUF1, MODY4, PAGEN1}, SLC17A5 (solute carrier family 17 member 5) [NCBI Gene 26503] {aka AST, ISSD, NSD, SD, SIALIN, SIASD}
- **Diseases:** Lung cancer (MESH:D008175), sepsis (MESH:D018805), organ dysfunction (MESH:D009102), cancer (MESH:D009369), peptic ulcer disease (MESH:D010437), stage IIIB/IV (MESH:C566890), low blood pressure (MESH:D007022), critically ill (MESH:D016638), stage I NSCLC (MESH:D062706), infection (MESH:D007239), SCLC (MESH:D018288), renal disease (MESH:D007674), diabetes (MESH:D003920), chronic pulmonary disease (MESH:D002908), metastasis (MESH:D009362), died (MESH:D003643), congestive heart failure (MESH:D006333), liver disease (MESH:D008107)
- **Chemicals:** dopamine (MESH:D004298), sodium (MESH:D012964), creatinine (MESH:D003404), urea nitrogen (MESH:C530477), bicarbonate (MESH:D001639), calcium (MESH:D002118), epinephrine (MESH:D004837), norepinephrine (MESH:D009638), potassium (MESH:D011188), losartan (MESH:D019808), oxygen (MESH:D010100), bilirubin (MESH:D001663)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12826459/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12826459/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12826459/full.md

---
Source: https://tomesphere.com/paper/PMC12826459