Assessing in-hospital mortality risk in ICU lung cancer patients using machine learning: An analysis based on the MIMIC-IV database

Jianwei Wang; Lizhen Lin; Li-ping Qiu; Li-lan Zheng; Lu-xi Wu; Hui Lv; Haihua Xie

PMC · DOI:10.1371/journal.pone.0341259·January 22, 2026

Assessing in-hospital mortality risk in ICU lung cancer patients using machine learning: An analysis based on the MIMIC-IV database

Jianwei Wang, Lizhen Lin, Li-ping Qiu, Li-lan Zheng, Lu-xi Wu, Hui Lv, Haihua Xie

PDF

Open Access

TL;DR

This study uses machine learning to predict in-hospital mortality risk for ICU lung cancer patients, identifying key factors like hospital stay duration and SAPS II score.

Contribution

The novel contribution is the development of an interpretable machine learning model using MIMIC-IV data to predict mortality in ICU lung cancer patients.

Findings

01

The XGBoost model achieved an AUC of 0.865 in the training cohort and 0.790 in the test cohort for predicting in-hospital mortality.

02

Hospital stay duration and SAPS II score were identified as the most influential predictors across multiple models.

03

SHAP analysis enhanced model interpretability, showing the direction and magnitude of key predictors.

Abstract

Patients with advanced lung cancer admitted to the intensive care unit (ICU) face a substantially elevated risk of in-hospital mortality. Early identification of high-risk individuals is essential to support timely clinical decision-making. This study aimed to develop and validate a predictive model using machine learning (ML) techniques to estimate in-hospital mortality in this patient population. Clinical data were obtained from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression, enabling the construction of eight ML models: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), artificial neural network (ANN), extreme gradient boosting (XGBoost), k-nearest neighbors (k-NN), adaptive boosting (AdaBoost), and random forest…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Figures6

Click any figure to enlarge with its caption.

Fig 2 — Lasso regression-based variable screening.

Fig 3 — The performance of the eight in-hospital mortality predictive models.

Fig 4 — Calibration capability and clinical beneft of the model.

Fig 5 — The top 36 features derived from XGBoost.

Fig 6 — SHAP value distribution: Direction and magnitude of feature contributions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSepsis Diagnosis and Treatment · Lung Cancer Diagnosis and Treatment · Machine Learning in Healthcare

Full text

Introduction

Lung cancer is a leading cause of cancer-related mortality worldwide, with a significant number of patients facing poor prognoses due to diagnoses made at advanced stages and inadequate treatment options [1–3]. The prognosis for lung cancer patients in the intensive care unit (ICU) is particularly bleak, with studies indicating in-hospital mortality rates that can be as high as 69% [4]. Moreover, identifying prognostic factors, such as performance status and biochemical markers, is essential for developing effective care strategies and prioritizing ICU admissions. Recent research has highlighted the increased mortality rates among lung cancer patients admitted to the ICU, thereby emphasizing the urgent need for reliable early prediction and risk stratification methods to identify high-risk individuals and improve clinical outcomes [5–7].By filling this knowledge gap, we can better allocate ICU resources, optimize treatment plans, and offer more accurate prognoses to patients and their families. Therefore, there is an immediate need for research that focuses specifically on predicting in – hospital mortality in this vulnerable patient population.

Machine learning (ML) offers a promising alternative due to its ability to analyze complex datasets, revealing patterns and relationships that may remain hidden in traditional statistical analyses [8,9]. This highlights the necessity for dependable predictive models that can accurately identify patients at increased risk of mortality during hospitalization. As an innovative form of artificial intelligence, ML has the potential to transform measurement results into relevant predictive models, particularly in oncology, aided by rapid advancements in large datasets and deep learning techniques. Recent findings demonstrate that ML has been effective in predicting lung cancer susceptibility, recurrence, survival rates, and the prognosis of malignant tumors [10–13]. However, there is a notable lack of data regarding in-hospital mortality risk prediction models utilizing ML approaches specifically for lung cancer patients in an ICU setting.

Previous studies may have only used a single or a few machine – learning models to predict the in – hospital mortality of critically ill lung cancer patients. In this study, eight different machine – learning models were constructed, including Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Artificial Neural Network (ANN), eXtreme Gradient Boosting (XGBoost), k – Nearest Neighbors (k – NN), Adaptive Boosting (AdaBoost), and Random Forest (RF). By comparing multiple models, it is possible to comprehensively evaluate the performance of different models in predicting the in – hospital mortality of this specific patient group, and thus select the optimal model.

Despite the high accuracy levels achieved by these ML models, the specific contributions of individual variables to their performance often remain unclear. This ambiguity limits the practical application of ML methodologies in clinical environments [14]. Shapley Additive ExPlanations (SHAP) provides a solution by integrating optimal credit allocation with localized interpretations, visually elucidating the importance of each variable within the model [15], thereby enhancing interpretability.

This study aims to utilize the extensive data available in the MIMIC-IV database to develop a robust ML model for predicting in-hospital mortality in lung cancer patients. The model will be validated using various algorithms, including logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), artificial neural network (ANN), extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), adaptive boosting (AdaBoost), and random forest (RF), thereby allowing a comprehensive assessment of its predictive capability. The identification of significant risk factors associated with in-hospital mortality will be prioritized, with the visual interpretation of the model using SHAP techniques to assist clinicians in identifying and intervening with high-risk populations.

Methods

Data source

This retrospective cohort study utilized data from the Medical Information Mart for Intensive Care-IV (MIMIC-IV, v2.2) (https://physionet.org/content/mimiciv/2.1/) [16,17], a clinical database that includes 730,141 ICU admissions from Beth Israel Deaconess Medical Center between 2008 and 2019, located in the United States. Since the study utilized non-identifiable, publicly available data, it received an exemption from informed consent requirements and Institutional Review Board (IRB) approval in accordance with ethical guidelines. All research procedures adhered to the ethical principles outlined in the 1964 Declaration of Helsinki and its subsequent amendments. Before data extraction, the research team completed the required web-based training modules and successfully passed the Protecting Human Research Participants certification exam, thereby obtaining authorized access to the MIMIC-IV database. A verbal notification was submitted to the institutional ethics committee, waiving the need for formal approval due to the retrospective, anonymized nature of the study.

Cohort selection

Individuals meeting any of the following criteria were excluded from the study: (1) those under 18 years of age at the time of their initial ICU admission; (2) patients with multiple ICU admissions; (3) cases with over 80% of personal data missing. We employed a random selection from the MIMIC-IV database to create both the training and test cohorts. Ultimately, this study included 1,288 patients in the training cohort and 527 patients in the test cohort, as depicted in the detailed flowchart presented in Fig 1.

The flowchart of this study.

Data collection and outcomes

Data extraction from the database regarding patients admitted to the ICU within the first 24 hours was conducted using structured query language (SQL) in PostgreSQL. The variables considered in this investigation included the following categories: (1) Demographic information: gender, race, admission age, and weight; (2) Laboratory indicators: blood urea nitrogen (BUN), prothrombin time (PT), red blood cell count (RBC), red cell distribution width (RDW), partial thromboplastin time (PTT), chloride, anion gap, hemoglobin, white blood cell count (WBC), platelet count, creatinine, potassium, sodium, calcium; (3) Comorbidities: congestive heart failure, chronic pulmonary disease, peptic ulcer disease, diabetes, renal disease, Charlson comorbidity index, mild liver disease; (4) Vital signs: heart rate (HR), respiratory rate (RR), systolic blood pressure (SBP), diastolic blood pressure (DBP), temperature, percutaneous arterial oxygen saturation (SpO2); (5) Scoring systems: sequential organ failure assessment (SOFA) score, glasgow coma scale (GCS), acute physiology score III (APS III); (6) Interventions: continuous renal replacement therapy (CRRT), mechanical ventilation, vasopressin, losartan, epinephrine, dopamine, norepinephrine. For multiple measurements, the maximum and minimum values recorded on the first day were utilized, with the exception of SpO2. To reduce the impact of missing data on model development, the KNN Imputer method was employed to impute values for data with less than 30% missingness, whereas data with over 30% missingness were excluded from the analysis. The primary outcome of interest was in-hospital mortality.

Statistical analysis

The Kolmogorov–Smirnov test was employed to analyze continuous variables. Given that all continuous variables demonstrated non-normal distributions, data were summarized using the median and interquartile range. To assess differences between groups, the Mann–Whitney U test was applied. Categorical variables were reported as percentages (%), and Pearson chi-squared tests were conducted to evaluate group differences. To address the class imbalance observed in the dependent variables, under-sampling techniques were implemented to readjust the dataset for balance. The sample data were partitioned into a training set and an internal validation set using fivefold cross-validation sampling. In scenarios involving numerous features, the LASSO technique was utilized for feature selection on the training set, which incorporates L1 regularization to select relevant features and reduce dimensionality by compressing coefficients, thereby identifying features with significant contributions whereas discarding redundant ones.

Development of predictive models for in-hospital mortality

This research implemented eight ML algorithms: LR, SVM, GBM, ANN, XGBoost, KNN, AdaBoost, and RF to develop a predictive model for in-hospital mortality. Use one-hot encoding to handle categorical variables in LR, SVM, ANN, and KNN, while label encoding is recommended for GBM, XGBoost, AdaBoost, and RF.The variables selected through lasso regression were incorporated into the model. To ensure the model’s robustness, ten-fold cross-validation was utilized. Parameter tuning was conducted using grid search to identify the optimal tuning parameters for each algorithm. During the parameter adjustment phase, the model exhibiting the highest area under the curve (AUC) from the receiver operating characteristic (ROC) analysis was identified as the most effective model. The models were constructed using the training dataset, followed by validation of the best model using both internal and external validation datasets.

LASSO regression can effectively identify these key features with linear relationships and retain them. Subsequently, XGBoost can further explore the possible non – linear interaction effects between features on the basis of these selected features. In this way, the hybrid approach combines the advantages of both models, leveraging LASSO regression’s effective capture of linear relationships and XGBoost’s ability to mine non – linear information between features, thereby improving the overall performance of the model.

Performance assessment

The evaluation of the predictive model’s performance was conducted utilizing the AUC of the ROC curve, as well as metrics such as sensitivity, specificity, F1 score, accuracy, and threshold values. Furthermore, decision curve analysis (DCA) and calibration curves were generated to illustrate the true clinical applicability of the model.

SHAP analysis

To further investigate the positive or negative effects of the significant features identified for predicting in-hospital mortality, a SHAP analysis was executed using Python version 3.7.0. The SHAP value represents the predicted contribution of each feature within the dataset.

Results

Baseline characteristics

Following a thorough screening process, a cohort of 1,755 patients from the MIMIC-IV database was identified for inclusion in this study, of which 368 individuals (21%) died after admission to the ICU. As presented in Table 1, the baseline characteristics of the enrolled patients in the MIMIC-IV database were summarized. Notably, older individuals, particularly those of Caucasian descent, demonstrated a higher susceptibility to mortality during their hospital stay. These patients also experienced extended hospital stays following ICU admission, coupled with significantly elevated in-hospital mortality rates.

Table 1: Characteristic at baseline between survivors and non-survivors group in MIMIC- IV.

Furthermore, patients categorized as Non-Survival exhibited elevated rates of Charlson comorbidity and mild liver disease. On the first day following admission to the ICU, Non-Survival patients were more likely to necessitate interventions, including the administration of vasopressors, CRRT, mechanical ventilation, and medications such as epinephrine, norepinephrine, and dopamine. Laboratory evaluations and vital sign measurements indicated that patients who did not survive demonstrated significantly lower levels of RBC, hemoglobin, SpO2, pH, base excess, albumin, anion gap, bicarbonate, and calcium. In contrast, these patients exhibited elevated concentrations of BUN, potassium, creatinine, total bilirubin, alanine aminotransferase (ALT), alkaline phosphatase (ALP), aspartate aminotransferase (AST), lactate dehydrogenase (LDH), PT, international normalized ratio (INR), WBC, RDW and the anion gap when compared to lung cancer patients who survived.

Feature selection

Lasso regression was employed to identify relevant features within the training dataset, with the characteristics of the variable coefficients illustrated in Fig 2. An iterative analysis employed a tenfold cross-validation approach. This analysis revealed that 27 variables demonstrated a significant correlation with lung cancer, including gender, hospital length of stay, age at admission, race, BUN, potassium, sodium, creatinine, PT, RBC, WBC, platelet count, RDW, hemoglobin, PTT, chloride, anion gap, calcium, heart_rate, SBP, DBP, resp_rate, temperature, SpO2, congestive heart failure, peptic ulcer disease, Charlson comorbidity index, mild liver disease, SOFA score, SAPS II score, GCS, ventilator hours, vasopressin usage, epinephrine usage, norepinephrine usage, and dopamine usage.

Lasso regression-based variable screening.

Model performance comparisons

In our study, we developed eight ML models to predict in-hospital mortality using a comprehensive set of features. The performance of these models is summarized in Table 2. Notably, the RF, KNN, and GBM models exhibited suboptimal predictive capabilities, as indicated by overfitting in the ROC curves. In contrast, the XGBoost model emerged as the most effective, achieving an accuracy of 0.783, an F1 score of 0.595, and an AUC of 0.865 within the training cohort. The performance metrics for the test cohort reflected similar trends, with an accuracy of 0.719, an F1 score of 0.543, and an AUC of 0.790 (refer to Table 2****). Furthermore, we conducted a ROC analysis to validate the predictive capabilities of these eight models concerning in-hospital mortality. As illustrated in Fig 3A and 3B, the RF, KNN, and GBM models consistently demonstrated the least predictive power, whereas the XGBoost model exhibited superior performance across both the training and test cohorts.

Table 2: Performance of the prediction models using all features.

The performance of the eight in-hospital mortality predictive models.

The calibration curves for all eight models are depicted in Fig 4A, C, providing essential insights into their predictive accuracy. Among the evaluated models, five—excluding the RF, KNN, and GBM—exhibited commendable agreement between predicted probabilities and actual observed results. Regarding clinical utility, all models, apart from the RF, KNN, and GBM, demonstrated significant net benefit across a broad spectrum of threshold probabilities. Notably, the XGBoost model provided the highest net benefit, establishing it as the most effective model for forecasting in-hospital mortality in lung cancer patients (Fig 4B, D).

Calibration capability and clinical beneft of the model.

Feature importance analysis

To clarify the significant features influencing model outputs, an analysis of feature importance was conducted. The top 36 features identified from the LR, SVM, Neural Network, and XGBoost models are presented in Fig 4. Within the XGBoost framework,hospital stay duration (los_hospital) and the SAPS II emerged as the most critical factors, followed by norepinephrine administration, vasopressin, PT, ventilator hours, WBC, and BUN levels (Fig 5****). In the LR model, the SAPS II was recognized as the most significant predictor of in-hospital mortality; additionally, ventilator hours, length of stay, norepinephrine administration, vasopressin usage, PT, and WBC count also substantially contributed to mortality predictions (S1 Fig). Furthermore, in the SVM and Neural Network models, the SAPS II and ventilator hours were similarly identified as having considerable effects on the prediction of in-hospital mortality, as illustrated in S2 and S3 Fig.

The top 36 features derived from XGBoost.

Interpretability analysis

To clarify the overall effects—both positive and negative—on model results and to investigate the similarities and differences in the significant characteristics of critically ill lung cancer patients across varying severity levels, the SHAP summary chart was employed. Fig 6A illustrates a detailed SHAP bee swarm plot that displays the impact of different features on the XGBoost model’s output.The horizontal axis represents the SHAP values, whereas the vertical axis ranks the features according to their cumulative effect on the SHAP values. Each data point corresponds to a unique instance, with its position along the x-axis reflecting the SHAP value associated with that specific instance and feature.

SHAP value distribution: Direction and magnitude of feature contributions.

The plot clearly ranks the features by their overall impact on the model’s predictions. Features at the top of the plot (e.g., los_hospital, vasopressin_used, sapsii_idx1, norepinephrine_used) are the most important.Hospital stay duration(los_hospital) appears to be the most important feature. Most of the points are clustered around zero, but the color variance suggests a complex relationship to other variables. sapsii_idx1 is likely a severity score. It shows the patients with high severity scores (yellow dots) have a wider spread of shap values, which suggest a positive association.vasopressin_used and norepinephrine_used indicate the use of vasopressors, likely used to treat low blood pressure. Higher values of these features (yellow dots) tend to have positive SHAP values, suggesting that their use is associated with an increased predicted value.

As depicted in Fig 6B, S4 Fig sapsii_idx1 = 32 is a severity score. A value of 32 contributes −0.0091. the SAPS II score was identified as the most significant feature among the top 15 in the XGBoost model; a higher SAPS II score correlates with an increased likelihood of in-hospital mortality. This finding highlights the necessity of prioritizing the SAPS II score when predicting in-hospital mortality rates.All the features shown in the graph pushes the prediction to the left, meaning that each of them contributed to decreasing the value from the average to the final prediction value.We have added a correlation heatmap and a table of correlation coefficients for the top features (SAPS II, hospital stay duration, PT, and norepinephrine use) in the S5 Fig.Our analysis shows SAPS II shows almost no correlation with hospital stay duration(0.0778) and do not appear to pose a risk to the interpretability of the model.

Discussion

In contemporary healthcare, the accurate prediction of in – hospital mortality for intensive care unit (ICU) patients with advanced lung cancer is of paramount importance. This study, leveraging the rich data within the Medical Information Mart for Intensive Care – IV (MIMIC – IV) database, aimed to develop and validate machine learning (ML) models for this purpose. Our findings contribute significantly to the existing body of knowledge, offering new insights into the factors influencing in – hospital mortality and highlighting the potential of ML in enhancing clinical decision – making.

Lung cancer patients admitted to ICUs exhibit significantly higher mortality rates than those with other solid malignancies, with results varying considerably by disease stage. Existing literature indicates mortality rates exceeding 50% for patients with advanced or extensive-stage disease. Specifically, Park et al. documented a 58.3% mortality rate among critically ill patients with advanced lung cancer in their Korean cohort study (2008–2010) [18]. Similarly, Song et al. reported mortality rates of 82.4% and 65.9% for stage IIIB/IV NSCLC and extensive-stage SCLC patients prior to and following 2011, respectively [19]. Our findings are more consistent with Adam et al.’s report of a 20% mortality rate in stage I NSCLC patients, likely attributable to our cohort’s predominance of primary rather than metastatic lung cancer cases [20].

This study developed a predictive model for in-hospital mortality among lung cancer patients using ML approaches. Twenty-seven clinical variables collected within the first 24 hours of ICU admission were analyzed using eight different ML algorithms. The XGBoost model exhibited superior performance, demonstrating robust discrimination, calibration, and significant clinical utility. Validation in an independent test cohort further affirmed the model’s reliability and accuracy. To enhance interpretability, we employed the SHAP framework for feature importance analysis. The results identified eight key predictors:hospital stay duration, SAPS II score, administration of norepinephrine or vasopressin, PT, duration of mechanical ventilation, WBC, and BUN levels. Additionally, SHAP force plots offered valuable insights into the model’s decision-making process for assessing individual patient mortality risk.

The XGBoost algorithm, an advanced implementation of gradient-boosted decision trees, demonstrates significant efficacy in handling large-scale datasets characterized by complex feature spaces. Medical research has widely adopted XGBoost-based predictive frameworks, revealing superior performance across various clinical domains, including sepsis management, cardiovascular risk assessment, and renal impairment evaluation [21–23]. Unlike traditional LR techniques, XGBoost effectively identifies non-linear associations and integrates multiple weak classifiers into a robust predictive model, enhancing generalizability. The SHAP framework offers valuable insights into ML model behavior by quantifying feature contributions, thereby improving model interpretability.The XGBoost model achieved the highest predictive performance, achieving an accuracy of 0.783, an F1 score of 0.595, and an AUC of 0.865 (95% CI: 0.840–0.891)within the training cohort. The performance metrics for the test cohort reflected similar trends, with an accuracy of 0.719, an F1 score of 0.543, and an AUC of 0.790(95% CI: 0.741–0.840).

This study systematically evaluated eight different ML approaches (LR, SVM, GBM, neural network, XGBoost, KNN, AdaBoost, and RF) for developing predictive models. Model performance was rigorously assessed using six established evaluation metrics: receiver operating characteristic curve analysis, F1 score, classification accuracy, sensitivity, specificity, and precision. Notably, our findings revealed superior predictive capability and stability with an ensemble approach that incorporated SVM, neural network, LR, and XGBoost, consistent with existing literature [24].

While comparing eight ML algorithms strengthens the methodological rigor of this study, the calibration curves (Fig A) reveal potential overfitting in some models, particularly Random Forest (RF) and k-Nearest Neighbors (k-NN).These models demonstrate a significant deviation from the ideal calibration line, suggesting that their predicted probabilities do not accurately reflect the observed event percentages in the validation dataset. Specifically, the observed event percentage is significantly higher in the validation data than the predicted probability from the KNN model and the RF model. This overconfidence in predictions is characteristic of overfitting, where the model has learned the training data too well and struggles to generalize to unseen data. While GBM also shows some calibration issues, the divergence is less pronounced than in RF and k-NN. This observation is reinforced when contrasting Fig 4A (training set) with Fig 4C (test set), where all models exhibit better calibration.

Through SHAP value analysis, we identified key predictive features, including hospital stay duration, SAPS II score, vasopressor administration (norepinephrine and vasopressin), PT, mechanical ventilation duration, WBC, and BUN levels. The SAPS II scoring system, a validated measure of organ dysfunction severity upon ICU admission, has demonstrated robust performance in mortality risk stratification [21,25–27]. Although extensively utilized for general ICU mortality prediction, the specific association between SAPS II scores and in-hospital mortality among lung cancer patients remains inadequately explored in current literature. Comparative studies have established the superior predictive validity of both SAPS II and SOFA scores for infection-related mortality in critical care populations, with score elevation correlating positively with increased mortality risk [28].

The association between the duration of ICU stay and patient mortality exhibits a biphasic pattern, indicating that both short stays (reflecting rapid clinical deterioration) and prolonged stays (suggesting persistent organ failure) are associated with increased mortality risks. However, existing research has not established precise cutoff values for these critical time points. Optimal clinical management should incorporate sequential physiological evaluations using standardized scoring systems, such as SAPS II, to facilitate comprehensive patient assessment [29,30]. If length of stay (LOS) is not known at admission, the use of first-day LOS can be explicitly stated as a proxy for SAPS II scoring.

Our predictive model exhibited consistent performance during external validation, confirming its robust generalizability across various clinical environments. Moreover, the selected predictive variables comprise standard clinical measurements, ensuring both ease of assessment and seamless integration across institutions with varying resource capacities.

Several methodological limitations merit attention in this study. First, the retrospective observational design introduces inherent selection bias and potential information bias due to inconsistencies in data collection and incomplete records. Second, whereas utilizing the MIMIC-IV database, the absence of external validation cohorts raises concerns regarding potential model overfitting, highlighting the necessity for future validation studies. Third, despite internal validation, the single-center nature of our data highlights the need for multicenter prospective studies to evaluate model generalizability across diverse populations. Most notably, the lack of histopathological and radiological characterization of lung cancer subtypes, particularly the distinction between small cell and non-small cell carcinomas, represents a significant limitation, as the absence of molecular diagnostic data may have affected model performance. Despite these limitations, our predictive model demonstrates clinical utility in assisting physicians with timely patient risk stratification.

In the process of developing our predictive model, we are acutely aware that several important clinical variables, namely cancer stage, metastasis, histology, and performance status, were not incorporated into the analysis. The absence of these variables has significant implications for the model’s clinical applicability.we acknowledge that the absence of these important clinical variables is a limitation of our study. By discussing the potential confounding effects and reasons for their missingness, we hope to provide a more comprehensive understanding of the model’s limitations and guide future research efforts to improve its clinical applicability.

While our XGBoost model demonstrates robust predictive accuracy in the MIMIC-IV cohort, its deployment in real-world ICU settings requires addressing several practical barriers. First, interoperability with existing electronic health record (EHR) systems is critical; this could be achieved through API-based integration or modular deployment within clinical decision support platforms.Second, clinician adoption hinges on interpretability—a challenge we mitigated via SHAP analysis (Fig 6), which quantifies and visualizes the impact of key variables (e.g., SAPS II, vasopressin use) on predictions. To foster trust, future iterations could incorporate interactive interfaces allowing clinicians to adjust inputs and observe risk estimates in real time. Regulatory approval would require prospective trials to evaluate model performance under dynamic ICU conditions and diverse patient demographics. A critical future direction involves deploying this model within a dedicated clinical platform to rigorously evaluate its runtime performance and integration feasibility.

Conclusion

In a clinical setting, especially at the bedside, healthcare providers need quick and easy – to – use tools for risk assessment. While the XGBoost model demonstrated the best performance in predicting in – hospital mortality among critically ill lung cancer patients, it may be complex for immediate use in a busy ICU environment. A simplified risk score can distill the key information from the model into a more accessible format, enabling rapid decision – making.

Based on this study, multiple models (logistic regression, support vector machine, artificial neural network, and XGBoost) have consistently identified several key predictive factors, which can serve as the basis for a simplified clinical risk score. These factors include hospital stay duration, SAPS II scores, vasopressor administration (norepinephrine and vasopressin), coagulation parameters (PT), mechanical ventilation duration, and metabolic markers (WBC and BUN). The process of developing a simplified score includes determining the weight of each predictive factor. This can be based on the results of SHAP analysis, the application of SHAP analysis improved clinical interpretability by clarifying feature contributions to mortality risk, providing intensivists with actionable insights for prognostic assessment in critical care settings.

In conclusion, developing a simplified clinical risk score based on the key predictive factors identified in this study is expected to improve the bedside applicability of the model and support clinical decision – making in the ICU. Although, a notable limitation of our study is that the analysis solely relied on internal divisions from MIMIC – IV. This restricts the model’s applicability in broader clinical scenarios.

Supporting information

S1 FigThe top 36 features derived from Logistic.(TIF)

S2 FigThe top 36 features derived from SVM.(TIF)

S3 FigThe top 36 features derived from Neural Network.(TIF)

S4 FigVisually XGBoost model using SHAP.(TIF)

S5 FigCorrelation heatmap and a table of correlation coefficients for the top features (SAPS II, hospital stay duration, PT, and norepinephrine use).(TIF)

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ferlay J, Colombet M, Soerjomataram I, Mathers C, Parkin DM, Piñeros M, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer. 2019;144(8):1941–53. doi: 10.1002/ijc.31937 30350310 · doi ↗ · pubmed ↗
2Bade BC, Dela Cruz CS. Lung cancer 2020: epidemiology, etiology, and prevention. Clin Chest Med. 2020;41(1):1–24. doi: 10.1016/j.ccm.2019.10.001 32008623 · doi ↗ · pubmed ↗
3Thai AA, Solomon BJ, Sequist LV, Gainor JF, Heist RS. Lung cancer. Lancet. 2021;398(10299):535–54. doi: 10.1016/S 0140-6736(21)00312-3 34273294 · doi ↗ · pubmed ↗
4ÖzpınarŞN, Gürün Kaya A, Öz M, Erol S, Arslan F, ÇiledağA, et al. Prognosis of lung cancer patients followed in the intensive care unit: a cross-sectional study. Tuberk Toraks. 2023;71(2):138–47. doi: 10.5578/tt.20239917 37345396 PMC 10795251 · doi ↗ · pubmed ↗
5Al-Dorzi HM, Atham S, Khayat F, Alkhunein J, Alharbi BT, Alageel N, et al. Characteristics, management, and outcomes of patients with lung cancer admitted to a tertiary care intensive care unit over more than 20 years. Ann Thorac Med. 2024;19(3):208–15. doi: 10.4103/atm.atm_287_23 39144533 PMC 11321528 · doi ↗ · pubmed ↗
6Liu W, Zhou D, Zhang L, Huang M, Quan R, Xia R, et al. Characteristics and outcomes of cancer patients admitted to intensive care units in cancer specialized hospitals in China. J Cancer Res Clin Oncol. 2024;150(4):205. doi: 10.1007/s 00432-024-05727-0 38642154 PMC 11032264 · doi ↗ · pubmed ↗
7Hautecloque-Rayz S, Albert-Thananayagam M, Martignene N, Le Deley M-C, Carbonnelle G, Penel N, et al. Long-Term Outcomes and Prognostic Factors of Patients with Metastatic Solid Tumors Admitted to the Intensive Care Unit. Oncology. 2022;100(3):173–81. doi: 10.1159/000520097 35051928 · doi ↗ · pubmed ↗
8Xie Z, Zhu S, Wang J, Zhang M, Lv X, Ma Y, et al. Relationship between coagulopathy score and ICU mortality: Analysis of the MIMIC-IV database. Heliyon. 2024;10(14):e 34644. doi: 10.1016/j.heliyon.2024.e 34644 39130418 PMC 11315200 · doi ↗ · pubmed ↗