# A machine learning model for predicting post-stroke epilepsy risk by integrating multimodal EEG-fMRI and clinical biomarkers

**Authors:** Ze Wang, Huanhuan Liu, Teng Ma

PMC · DOI: 10.3389/fneur.2026.1722475 · Frontiers in Neurology · 2026-02-17

## TL;DR

This study created a machine learning model using brain scans and clinical data to predict the risk of epilepsy after a stroke, helping identify high-risk patients early.

## Contribution

A novel machine learning model integrating multimodal EEG-fMRI and clinical biomarkers for predicting post-stroke epilepsy risk.

## Key findings

- The random forest model achieved an AUC of 0.892 in the training set and 0.731 in the validation set for predicting post-stroke epilepsy.
- Epileptiform discharge frequency, NIHSS score, and stroke lesion volume were the top three features contributing to post-stroke epilepsy risk.
- The nomogram and SHAP values provided interpretable risk visualization and validated the clinical relevance of the model.

## Abstract

This study aimed to develop and validate a machine learning model integrating multimodal electroencephalography-functional magnetic resonance imaging (EEG-fMRI) features with clinical biomarkers for predicting post-stroke epilepsy (PSE) risk, thus providing a quantitative tool for early identification of high-risk patients.

A total of 365 acute stroke patients admitted to our hospital from January 2021 to June 2024 were retrospectively enrolled and randomly divided into training (n = 256) and validation (n = 109) sets in a 7:3 ratio. Demographic data, EEG parameters, multimodal MRI indices, and serum biomarkers were collected. In the training set, univariate analysis was first performed to screen relevant factors, followed by LASSO regression for variable selection. Multivariate logistic regression was ultimately used to identify independent risk factors. Based on key predictors, random forest (RF), support vector machine (SVM), and gradient boosting (GB) models were constructed using Python. Model performance was evaluated and optimized via the area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA). A nomogram was developed for risk visualization, and SHapley Additive exPlanations (SHAP) values were employed for interpretability analysis to quantify the direction and magnitude of feature contributions.

No significant differences in baseline characteristics were observed between the training and validation sets (P > 0.05), confirming data comparability. Univariate and multivariate logistic regression showed that epileptiform discharge frequency (EDF), background EEG delta wave ratio (BEDWR), stroke lesion volume (SLV), National Institutes of Health Stroke Scale (NIHSS) score, and serum neuron-specific enolase (NSE) levels were independent risk factors for PSE (all P < 0.05). Among the models, RF demonstrated superior predictive performance, with AUCs of 0.892 (training set) and 0.731 (validation set). Interpretability analysis showed that the nomogram enabled individualized risk calculation. SHAP values confirmed EDF (highest mean SHAP value), NIHSS score, and lesion volume as the top three positively contributing features (higher values correlated with increased PSE risk), aligning with regression results and validating clinical rationality.

An RF model integrating multimodal data was successfully developed to effectively predict PSE risk. EDF, NIHSS score, SLV, BEDWR, and serum NSE were identified as core predictive indicators.

## Linked entities

- **Diseases:** epilepsy (MONDO:0005027), stroke (MONDO:0005098)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, ENO2 (enolase 2) [NCBI Gene 2026] {aka HEL-S-279, NSE}
- **Diseases:** EDF (MESH:D019522), neuronal injury (MESH:D009410), epileptiform (MESH:D014277), heart failure (MESH:D006333), infarct (MESH:D007238), Cerebrovascular Diseases (MESH:D002561), agitation (MESH:D011595), intracranial infections (MESH:D007239), end-stage renal disease (MESH:D007676), post (MESH:D000094025), ischemic stroke (MESH:D002544), intracerebral hemorrhage (MESH:D002543), atherosclerotic (MESH:D050197), gliosis (MESH:D005911), Epilepsy (MESH:D004827), end-stage liver disease (MESH:D058625), damage to (MESH:D020263), PSE (MESH:D004834), Hemolysis (MESH:D006461), neurological deficits (MESH:D009461), Seizure (MESH:D012640), Stroke (MESH:D020521), hemorrhage (MESH:D006470), traumatic brain injury (MESH:D000070642), hemorrhagic stroke (MESH:D000083302), rupture (MESH:D012421), ischemic (MESH:D002545), malignancy (MESH:D009369), NIHSS (MESH:C538175), matter injury (MESH:D014947), HL (MESH:C538324), neurodegenerative diseases (MESH:D019636), hematoma (MESH:D006406), metabolic disturbances (MESH:D024821)
- **Chemicals:** QDTC01214 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12953071/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12953071/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12953071/full.md

---
Source: https://tomesphere.com/paper/PMC12953071