# Development and validation of a prediction model for long-term cognitive frailty risk in stroke patients based on CHARLS data

**Authors:** Shunli Zuo, Ning Liu, Jiaxian Wang, Jingling Li, Xiuyuan Zhu, Yuyang Jia

PMC · DOI: 10.1371/journal.pone.0340715 · PLOS One · 2026-03-25

## TL;DR

This study created and tested machine learning models to predict cognitive frailty risk in elderly stroke patients using data from China, finding that XGBoost performed best.

## Contribution

The study introduces a validated XGBoost model for predicting cognitive frailty in stroke patients using accessible clinical and demographic data.

## Key findings

- XGBoost and Random Forest showed the highest predictive performance with AUCs of 0.810 and 0.795, respectively.
- Age and education were the most significant predictors of cognitive frailty risk.
- The model can be used for early screening and targeted interventions in primary care settings.

## Abstract

This study aimed to develop and validate machine learning (ML) models for predicting the risk of cognitive frailty in community-dwelling elderly adults with stroke.

This study involved 2,325 stroke survivors from the China Health and Retirement Longitudinal Study (CHARLS), conducted between 2018 and 2020. We examined 22 behavioral variables, encompassing indicators from the sociodemographic, physical, psychological, cognitive, and social domains. LASSO regression was employed to identify predictive factors, and eight machine learning models—Logistic Regression, Decision Tree, XGBoost, Support Vector Machine, k-Nearest Neighbors, Naïve Bayes, Random Forest, and LightGBM—were utilized to ascertain the optimal model for predicting cognitive frailty among stroke survivors. SHapley Additive exPlanations (SHAP) values were applied to interpret the contributions of the variables.

A total of 2,325 stroke patients were included in the study, among whom 688 (29.59%) exhibited symptoms of cognitive frailty. Of the eight models evaluated, XGBoost (AUC = 0.810) and Random Forest (AUC = 0.795) demonstrated the highest predictive performance for stroke-related cognitive frailty. Key predictors identified were education, nutritional status, physical exercise, Instrumental Activities of Daily Living (IADL), and age, with corresponding SHAP values of 0.28, 0.18, 0.16, 0.21, and 0.32, respectively. The SHAP values indicated that age and education level are the most significant factors in predicting the risk of cognitive frailty in this population.

This study developed eight risk prediction models for post-stroke cognitive frailty utilizing machine learning, with the XGBoost algorithm demonstrating superior performance. Leveraging readily available clinical and demographic indicators, the optimized XGBoost model serves as a practical tool for the early screening of cognitive frailty risk among community-dwelling elderly stroke survivors, particularly within primary care settings. This model can aid clinicians in devising targeted intervention strategies to mitigate disease progression and establish a foundation for future prospective studies examining the mechanisms underlying cognitive frailty in stroke populations. Further external validation is necessary to confirm its generalizability across various clinical contexts.

## Linked entities

- **Diseases:** stroke (MONDO:0005098)

## Full-text entities

- **Genes:** CST3 (cystatin C) [NCBI Gene 1471] {aka ADLDWA, ARMD11, HEL-S-2}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** visual or auditory impairments (MESH:D014786), weight loss (MESH:D015431), IADL (MESH:D020773), brain atrophy (MESH:C566985), Dementia (MESH:D003704), CHARLS (OMIM:603663), AIS (MESH:D000083242), inflammation (MESH:D007249), anxiety (MESH:D001007), systemic diseases (MESH:D034721), chronic pain (MESH:D059350), Alzheimer's disease (MESH:D000544), heart failure (MESH:D006333), malnourished (MESH:D044342), Depression (MESH:D003866), end-stage liver disease (MESH:D058625), post (MESH:D000094025), MCI (MESH:D060825), Stroke (MESH:D020521), dysarthria (MESH:D004401), CF (MESH:D000073496), PSCI (MESH:D003072), psychiatric disorders (MESH:D001523), aphasia (MESH:D001037)
- **Chemicals:** folic acid (MESH:D005492), vitamin D (MESH:D014807), flavonoids (MESH:D005419)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13016350/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13016350/full.md

## References

56 references — full list in the complete paper: https://tomesphere.com/paper/PMC13016350/full.md

---
Source: https://tomesphere.com/paper/PMC13016350