# Machine learning increases the prediction of stroke for Chinese hypertensive patients

**Authors:** Ying Zhou, Wanshu Deng, Wentao Wang

PMC · DOI: 10.3389/fmicb.2026.1737655 · Frontiers in Microbiology · 2026-01-23

## TL;DR

This study uses machine learning to build a more accurate model for predicting stroke risk in Chinese patients with hypertension compared to traditional methods.

## Contribution

A high-precision machine learning model for stroke prediction in hypertensive patients, outperforming traditional risk scores.

## Key findings

- The ML model achieved a C-statistic of 0.967, significantly higher than the best traditional model (0.781).
- The model showed acceptable calibration with a Brier score of 0.053.
- Ten key variables were selected using RFE to train the XGBoost model.

## Abstract

We aim to construct a machine learning (ML) model to predict stroke risk in patients with hypertension.

In all, 68 variables, including demographic information, medical history and medication use, lifestyle, anthropometry laboratory tests, electrocardiography, and echocardiography, were selected for baseline analysis. Of these, 10 optimal variables were selected by Recursive feature elimination (RFE) and then the model was trained and tested using eXtreme Gradient Boosting (XGBoost). A 10- fold cycle of cross-validation was used during the process. Next, XGBoost was used to develop a prediction model. Four traditional Cox regression models including the China-PAR Score and the Framingham Stroke Risk Score model were established and compared with the ML model. Finally, the results of the performance assessment of the models were compared using C-statistics for discrimination and Brier score for calibration.

In all, we included 5,197 hypertensive participants (mean age = 57.16 ± 10.20 years) from the Northeast China Rural Cardiovascular Health Study (NCRCHS). Of these, end point events occurred in 294 patients (5.7%, 185 males and 109 females) during a mean follow-up period of 4.26 ± 1.03 years. Using RFE, 10 variables were selected to construct the XGBoost model. The ML model demonstrated better discrimination than the best performing Cox regression model [C-statistic 0.967 (95% CI, 0.956, 0.978) vs. 0.781 (95% CI, 0.772, 0.785), respectively] with an acceptable calibration (Brier score = 0.053).

Using the ML method, we constructed a high-precision prognostic model to predict stroke risk in patients with hypertension. This model exhibited a better classification effect and better performance compared with the traditional risk scales. The model could be used in clinical practice to achieve early prevention and intervention of stroke.

## Linked entities

- **Diseases:** stroke (MONDO:0005098)

## Full-text entities

- **Diseases:** Stroke (MESH:D020521), hypertension (MESH:D006973)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12880818/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12880818/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12880818/full.md

---
Source: https://tomesphere.com/paper/PMC12880818