# Development and validation of an early predictive model for hemiplegic shoulder pain: a comparative study of logistic regression, support vector machine, and random forest

**Authors:** Qiang Wu, Fang Zhang, Yuchang Fei, Zhenfen Sima, Shanshan Gong, Qifeng Tong, Qingchuan Jiao, Hao Wu, Jianqiu Gong

PMC · DOI: 10.3389/fneur.2025.1612222 · Frontiers in Neurology · 2025-06-18

## TL;DR

This study compares machine learning models to predict hemiplegic shoulder pain in stroke patients, finding that random forest performs best and provides clinical insights.

## Contribution

A novel comparative analysis of logistic regression, SVM, and random forest for early prediction of hemiplegic shoulder pain with explainability insights.

## Key findings

- The random forest model achieved the highest accuracy (0.90) and AUC-ROC (0.94) for predicting hemiplegic shoulder pain.
- SHAP analysis revealed that multiple injuries and shoulder joint flexion were the most influential predictors of HSP.
- The random forest model demonstrated strong clinical explainability and practical utility for early warning and management.

## Abstract

In this study, we aim to identify the predictive variables for hemiplegic shoulder pain (HSP) through machine learning algorithms, select the optimal model and predict the occurrence of HSP.

Data of 332 stroke patients admitted to a tertiary hospital in Zhejiang Province from January 2022 to January 2023 were collected. After screening predictive variables by LASSO regression, three predictive models selected using the LazyPredict package, namely logistic regression (LR), support vector machine (SVM) and random forest (RF), were established respectively. The performance parameters (accuracy, precision, recall, and F1 score) of the models were calculated, the receiver operating characteristic curve (ROC) and the decision curve analysis (DCA) were plotted to compare the performance of the three models. An explainability analysis (SHAP) was conducted on the optimal model.

The RF model performed the best, with accuracy: 0.90, precision: 0.89, recall: 0.88, F1 score: 0.86, AUC-ROC: 0.94, and the range of the threshold probability in DCA: 7%−99%. Based on the SHAP analysis of the explainability of the RF model, the contribution degrees of the early HSP predictive variables from high to low are as follows: multiple injuries, shoulder joint flexion (p), biceps tendon effusion, sensory disorder, supraspinatus tendinopathy, subluxation, diabetes, and age.

The RF prediction model has a good predictive effect on HSP and has good clinical explainability. It can provide objective references for the early warning and stratified management of HSP.

## Linked entities

- **Diseases:** stroke (MONDO:0005098), diabetes (MONDO:0005015)

## Full-text entities

- **Diseases:** injuries (MESH:D014947), stroke (MESH:D020521), subluxation (MESH:D004204), diabetes (MESH:D003920), HSP (MESH:D020069), biceps tendon effusion (MESH:D052256), sensory disorder (MESH:D012678), shoulder joint flexion (MESH:D000070599)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12213372/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12213372/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12213372/full.md

---
Source: https://tomesphere.com/paper/PMC12213372