# Interpretable machine learning-based predictive model for malnutrition in subacute post-stroke patients: an internal and external validation study

**Authors:** Ping Sun, Junqi Luan, Guotao Duan, Qingqing Sun, Genli Liu

PMC · DOI: 10.3389/fnut.2025.1692020 · 2026-01-05

## TL;DR

This study developed a machine learning model to predict malnutrition risk in post-stroke patients, validated across multiple centers and showing strong predictive performance.

## Contribution

The novel contribution is an interpretable machine learning model using CatBoost for malnutrition prediction in subacute stroke patients, validated internally and externally.

## Key findings

- The CatBoost model achieved high AUC of 0.848 in training and 0.806 in testing sets.
- External validation showed an AUC of 0.772, confirming the model's generalizability.
- Age, handgrip strength, and Barthel Index were identified as key predictors via SHAP analysis.

## Abstract

Malnutrition is a critical concern associated with increased mortality rates and adverse outcomes in stroke adults undergoing subacute rehabilitation. Despite its clinical significance, predictive tools for assessing malnutrition risk in this population remain limited. This study aimed to develop and validate an interpretable machine learning (ML) model to predict malnutrition risk among stroke patients during subacute rehabilitation.

This multicenter study comprised a development cohort of 802 patients from a single institution, which randomly split into training and testing sets at a 7:3 ratio. An external validation cohort of 345 patients was recruited from an independent hospital. Feature selection was conducted using the Least Absolute Shrinkage and Selection Operator (LASSO) regression combined with the Boruta algorithm. Eight ML models—Logistic Regression (LR), Random Forests (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Support Vector Machines (SVM), k-Nearest Neighbors (KNN), Neural Network (NNet), and CatBoost (CAT)—were trained utilizing five-fold cross-validation. These models were evaluated using metrics such as discrimination, calibration curve, and decision curve analysis (DCA). Model interpretability was assessed via Shapley Additive Explanations (SHAP) analysis.

The CAT algorithm exhibited superior predictive model in the training and testing sets, achieving an area under the receiver operating characteristic curve (AUC) of 0.848 (95% CI: 0.817–0.879) and 0.806 (95% CI = 0.752–0.861), respectively. Calibration metrics underscored the model’s robustness and DCA emphasized its clinical utility. External validation further corroborated the generalizability of the CAT model, demonstrating an AUC of 0.772; (95% CI: 0.723–0.820). SHAP analysis identified age, handgrip strength, and Barthel Index (BI) score as the most significant predictors of malnutrition.

This study successfully developed and validated an ML model for efficiently screening malnutrition risk in patients with subacute stroke. The interpretable CAT-based model serves as a clinically actionable tool, enabling early stratification of malnutrition risk in subacute stroke patients. This facilitates the implementation of targeted nutritional interventions and personalized rehabilitation strategies, potentially improving outcomes in this vulnerable population.

## Linked entities

- **Diseases:** stroke (MONDO:0005098)

## Full-text entities

- **Genes:** CAT (catalase) [NCBI Gene 847], FGB (fibrinogen beta chain) [NCBI Gene 2244] {aka HEL-S-78p}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}
- **Diseases:** depression (MESH:D003866), post (MESH:D000094025), diabetes (MESH:D003920), NIHSS (MESH:C538175), metabolic dysregulation (MESH:D021081), weight loss (MESH:D015431), functional impairment (MESH:D003072), cardiovascular disease (MESH:D002318), Malnutrition (MESH:D044342), impaired self-care capacity (MESH:C000657744), nutritional deficits (MESH:D009748), hemiparesis (MESH:D010291), chronic kidney disease (MESH:D051436), ischemic attack (MESH:D002546), ML (MESH:D007859), end-stage cardiac/renal dysfunction (MESH:D007676), inflammation (MESH:D007249), pneumonia (MESH:D011014), long (MESH:D000094024), dyslipidemia (MESH:D050171), dysarthria (MESH:D004401), systemic disease (MESH:D034721), anorexia (MESH:D000855), Stroke (MESH:D020521), infection (MESH:D007239), paralysis (MESH:D010243), ischemic (MESH:D002545), frailty (MESH:D000073496), muscle mass (MESH:C536030), long-term disability (MESH:D000088562), Ischemic stroke (MESH:D002544), psychiatric disorders (MESH:D001523), hypertension (MESH:D006973), hemorrhagic stroke (MESH:D000083302), digestive disease (MESH:D004066), loss of appetite (MESH:D001068), sarcopenia (MESH:D055948), Dysphagia (MESH:D003680), malignant tumors (MESH:D009369), neurological impairment (MESH:D009422), aphasia (MESH:D001037), post-stroke dementia (MESH:D003704), reduced muscle mass (MESH:D009135), subarachnoid hemorrhage (MESH:D013345), muscle wasting (MESH:D009133), hemorrhagic (MESH:D006470)
- **Chemicals:** TC (MESH:D013667), TG (MESH:D013866), cholesterol (MESH:D002784), triglycerides (MESH:D014280), DCA (-), creatinine (MESH:D003404), uric acid (MESH:D014527)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12824420/full.md

---
Source: https://tomesphere.com/paper/PMC12824420