# Machine learning model for predicting malnutrition risk in lung cancer patients after thoracoscopic resection: a multi-center study

**Authors:** Tianfeng Chen, Ruilan Pan, Ling Liang, Limei Xu, Mingyue Yang, Xiujuan Deng, Ping Wang

PMC · DOI: 10.3389/fonc.2026.1727595 · Frontiers in Oncology · 2026-02-09

## TL;DR

This study develops an explainable machine learning model to predict malnutrition risk in lung cancer patients after surgery, improving clinical decision-making with a web-based tool.

## Contribution

The novel contribution is an interpretable machine learning model for malnutrition risk prediction in lung cancer patients, validated across multiple centers and made accessible via a web-based calculator.

## Key findings

- XGBoost model achieved high AUC of 0.845 in testing and 0.886 in external validation for malnutrition risk prediction.
- SHAP analysis enhanced model interpretability by clarifying the importance of risk factors like albumin and nutritional scores.
- A web-based risk calculator was developed to support personalized nutritional interventions in clinical practice.

## Abstract

Early detection of malnutrition is critical for timely intervention in lung cancer patients undergoing thoracoscopic resection. Existing black-box prediction models lack clinical interpretability, limiting trust and application. The present study was conducted to predict malnutrition risk by establishing an explainable machine learning (ML) model and evaluate the model performance across several sites, so as to develop a web-based application to aid clinical decision-making.

A retrospective analysis was conducted on 1, 134 lung cancer patients who underwent thoracoscopic resection at Dongguan People’s Hospital between October 2021 and October 2024, consisting of a training set (n = 795) and a testing set (n = 339). Meanwhile, an external validation cohort (n=273) was prospectively enrolled at the Affiliated Hospital of Guangdong Medical University from March to June of 2025. Furthermore, univariate and multivariate analyses were employed to determine the individual risk variables for post-operative malnutrition. This study constructed eight ML models using Gradient Boosting Machine (GBM), Neural Network, Logistic Regression, Extreme Gradient Boosting (XGBoost), Random Forest, K-Nearest Neighbors (KNN), Adaptive Boosting (AdaBoost), and Support Vector Machine (SVM). The performance of the established models was assessed by decision curve analysis (DCA) and receiver operating characteristic (ROC) curves. Meanwhile, feature contributions and visualize model outputs were quantified using the SHapley Additive exPlanations (SHAP) method to enhance clinical interpretability. Consequently, a web-based risk calculator was created to assist in personalized forecasting.

Among 1, 407 total patients, post-operative malnutrition incidence was 11.3% (159/1, 407). Multivariate analysis identified seven independent risk factors: albumin (ALB), Nutritional Risk Screening 2002 score, age, intraoperative blood loss, total drainage volume, Basic Activities of Daily Living (BADL) score, and serum potassium (K). The XGBoost model outperformed others, with AUC 0.845 (95% CI: 0.771–0.919) in the testing set and 0.886 (95% CI: 0.841–0.932) in external validation. SHAP analysis clarified the relative importance of risk factors, improving interpretability.

The XGBoost-based explainable ML model effectively predicts malnutrition risk in lung cancer patients after thoracoscopic resection. Integrating high predictive performance with interpretability, it supports clinical risk stratification and personalized nutritional interventions to improve post-operative outcomes. A publicly available web-based calculator facilitates easy clinical application.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Genes:** CYGB (cytoglobin) [NCBI Gene 114757] {aka HGB, NOD, STAP}, TNF (tumor necrosis factor) [NCBI Gene 7124] {aka DIF, IMD127, TNF-alpha, TNFA, TNFSF2, TNLG1F}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** chronic obstructive pulmonary disease (MESH:D029424), fatigue (MESH:D005221), hypoalbuminemia (MESH:D034141), energy deficiency (MESH:D011502), respiratory dysfunction (MESH:D012131), pleural effusions (MESH:D010996), nausea (MESH:D009325), hemorrhage (MESH:D006470), squamous cell carcinoma (MESH:D002294), teeth loosening (MESH:D018677), malignant pleural effusion (MESH:D016066), hypoxia (MESH:D000860), paralytic ileus (MESH:D007418), gastrointestinal mucosal ischemia (MESH:D007511), airway mucosal function (MESH:D000402), impaired energy metabolism (MESH:D008659), chylothorax (MESH:D002916), pain (MESH:D010146), trauma (MESH:D014947), bowel movement (MESH:D012778), inflammation (MESH:D007249), gastrointestinal symptoms (MESH:D012817), loss (MESH:D016388), abnormal body composition and impaired physiological function (MESH:C564221), small cell carcinoma (MESH:D018288), anorexia (MESH:D000855), edema (MESH:D004487), blood loss (MESH:D016063), Lung Cancer (MESH:D008175), diabetes mellitus (MESH:D003920), muscle weakness (MESH:D018908), adenocarcinoma (MESH:D000230), dietary insufficiency (MESH:D000309), cancer (MESH:D009369), Hypokalemia (MESH:D007008), reduced appetite (MESH:D001068), tuberculosis (MESH:D014376), constipation (MESH:D003248), exercise phobia (MESH:D010698), abdominal distension (MESH:D000007), gastrointestinal smooth muscle paralysis (MESH:D018235), hypertension (MESH:D006973), blood (MESH:D006402), death (MESH:D003643), Malnutrition (MESH:D044342), anemia (MESH:D000740), hypermetabolism (MESH:C565498), infection (MESH:D007239), gastrointestinal dysfunction (MESH:D005767), compromised immunity (MESH:D007154)
- **Chemicals:** Cholesterol (MESH:D002784), Cl (MESH:D002712), nitrogen (MESH:D009584), MCT (MESH:C000709826), GLU (MESH:D005947), Ca (MESH:D002118), K (MESH:D011188), Na (MESH:D012964), TCH (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** KYKT2025 — Homo sapiens (Human), Finite cell line (CVCL_V825)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12926100/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12926100/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC12926100/full.md

---
Source: https://tomesphere.com/paper/PMC12926100