# A machine learning-based predictive model for postoperative pulmonary complications in lung cancer and its SHAP interpretation

**Authors:** Yun Sha, Zejing Huangfu, Xinyu Gu, Beining Tang, Zhenchao Lv, Yanming Li, Ji Yang, Jinyuan Yang, Shihao Shao, Zhonghui Wang

PMC · DOI: 10.3389/fonc.2026.1749808 · Frontiers in Oncology · 2026-03-13

## TL;DR

This study creates a machine learning model to predict post-surgery lung complications in cancer patients and explains key risk factors using SHAP analysis.

## Contribution

A novel predictive model with SHAP interpretation for preoperative risk assessment of postoperative pulmonary complications in lung cancer surgery.

## Key findings

- The KNN model showed high performance in predicting PPCs with strong AUROC and clinical utility.
- SHAP analysis identified inflammation, diabetes, hypertension, and smoking as key risk factors for PPCs.
- The model provides a reliable tool for preoperative risk stratification and personalized care planning.

## Abstract

Postoperative pulmonary complications (PPCs) significantly impair patient recovery and adversely affect the long-term prognosis following lung cancer surgery. Despite ongoing advancements in surgical techniques and perioperative care, the incidence of PPCs remains elevated, underscoring the pressing clinical necessity for dependable preoperative risk assessment tools.

This study employed a retrospective design, encompassing 1, 223 patients who underwent lung cancer surgery, from whom perioperative clinical data were collected. Following data cleansing and feature selection, the dataset was stratified and randomly divided into training (70%) and testing (30%) sets. Model development and hyperparameter tuning were executed using stratified 10-fold cross-validation (CV) within the training set; all preprocessing and feature selection procedures were confined to the training folds to prevent information leakage. The discriminative and calibration performance of various machine learning algorithms were assessed, and clinical net benefits were appraised using decision curve analysis (DCA). Additionally, Shapley Additive Explanations (SHAP) were employed to elucidate the contributions of specific features to the risk of developing PPCs.

Among the evaluated models, the k-nearest neighbors (KNN) algorithm demonstrated superior performance, evidenced by a high area under the receiver operating characteristic curve (AUROC) and favorable clinical utility in the DCA. SHAP analysis revealed that factors such as perioperative inflammatory burden, diabetes, hypertension, and smoking history are pivotal in influencing the risk of PPCs.

The developed machine learning-based predictive model, augmented with SHAP interpretations, effectively identifies patients at high risk for PPCs prior to surgery. This model provides a robust scientific foundation for tailored perioperative care and interventions, offering substantial potential for clinical application.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138), diabetes (MONDO:0005015)

## Full-text entities

- **Diseases:** diabetes (MESH:D003920), PPCs (MESH:D011183), lung cancer (MESH:D008175), inflammatory (MESH:D007249), hypertension (MESH:D006973)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13022757/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13022757/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC13022757/full.md

---
Source: https://tomesphere.com/paper/PMC13022757