# Machine Learning for Predicting Stroke Risk Stratification Using Multiomics Data: Systematic Review

**Authors:** Hae Young Yoo, Hyerim Shin, Eun-Jung Kim, Youn-Jung Son

PMC · DOI: 10.2196/85654 · Journal of Medical Internet Research · 2026-02-19

## TL;DR

This paper reviews machine learning models using multiomics data for predicting stroke risk, finding high accuracy but limited reproducibility due to small samples and inconsistent methods.

## Contribution

The study systematically evaluates ML models for stroke risk using multiomics data, highlighting performance ranges and methodological limitations.

## Key findings

- Seven studies showed multiomics ML models achieved AUCs from 0.75 to 0.97 for stroke risk prediction.
- Most models combined two omics layers, like metabolomics-proteomics, using middle-level integration.
- External validation was limited, with only three studies validating integrated models.

## Abstract

Stroke is a complex, multidimensional disorder influenced by interacting inflammatory, immune, coagulation, endothelial, and metabolic pathways. Single-omics approaches seldom capture this complexity, whereas multiomics techniques provide complementary insights but generate high-dimensional and correlated feature spaces. Machine learning (ML) offers strategies to manage these challenges; however, the predictive accuracy and reproducibility of multiomics-based ML models for stroke remain poorly characterized.

This review aimed to conduct a systematic evaluation of ML models using multiomics data for stroke risk stratification and comprehensive patterns in discriminatory performance, integration strategies, and validation and reporting practices to inform future methodological development.

We conducted a comprehensive literature search following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 recommendations. Studies published from January 2000 to July 2025 were identified across 9 databases, including PubMed, MEDLINE Ultimate, EMBASE, CINAHL, Web of Science, Scopus, Cochrane CENTRAL, ACM Digital Library, and IEEE Xplore. Eligible studies included adults with ischemic, hemorrhagic, or unspecified stroke as the prediction target; applied at least 2 omics layers; and reported ML performance metrics. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool, while reporting quality was evaluated using Minimum Information for Medical AI Reporting. The primary outcome was the area under the receiver operating characteristic curve.

A total of 7 studies (n=40,274) published between 2022 and 2025 fulfilled the inclusion criteria. All studies combined 2 omics layers, most often using middle-level integration with dyads such as metabolomics-proteomics and metabolomics-lipidomics. Supervised ML algorithms across studies included support vector machines, tree-based ensembles, generalized linear models, and deep learning architectures. Three studies reported external validation of the integrated multiomics model, while 1 study conducted only an external assessment of a single marker rather than validation of the integrated model. Three studies reported an assessment of calibration, and clinically prespecified operating points were rarely described. Reported areas under the receiver operating characteristic curve varied by prediction task, ranging from 0.75 to 0.96 for acute diagnosis models and from 0.75 to 0.97 for onset risk prediction models; the highest externally validated performance was achieved by a support vector machine trained on a metabolomics-proteomics dyad in mixed stroke types (ischemic and hemorrhagic).

Multiomics ML models showed high apparent discrimination for stroke risk stratification, but current evidence remains methodologically limited. Small sample sizes, heterogeneous designs, and incomplete reporting currently hinder the reproducibility and generalizability of multiomics ML models for stroke risk prediction. To advance the field, future studies should adopt leakage-resistant evaluation frameworks, conduct site-specific external validations, and benchmark against both single-omics and clinical baselines to demonstrate incremental value. Well-designed, transparently reported investigations will be essential to move multiomics ML models from exploratory promise toward clinically actionable tools in precision stroke care.

## Linked entities

- **Diseases:** stroke (MONDO:0005098), ischemic stroke (MONDO:1060198), hemorrhagic stroke (MONDO:1060199)

## Full-text entities

- **Genes:** HBB (hemoglobin subunit beta) [NCBI Gene 3043] {aka CD113t-C, ECYT6, beta-globin}, COL15A1 (collagen type XV alpha 1 chain) [NCBI Gene 1306], C4BPA (complement component 4 binding protein alpha) [NCBI Gene 722] {aka C4BP, PRP}, ITGAM (integrin subunit alpha M) [NCBI Gene 3684] {aka CD11B, CR3A, HNA-4, MAC-1, MAC1A, MO1A}, GDF15 (growth differentiation factor 15) [NCBI Gene 9518] {aka GDF-15, HG, MIC-1, MIC1, NAG-1, PDF}, IGFBP4 (insulin like growth factor binding protein 4) [NCBI Gene 3487] {aka BP-4, HT29-IGFBP, IBP4, IGFBP-4}
- **Diseases:** hemorrhagic (MESH:D006470), ML (MESH:D007859), PROBAST (MESH:D004195), Stroke (MESH:D020521), neurological deficits (MESH:D009461), metabolic abnormalities (MESH:D008659), inflammatory (MESH:D007249), ischemic (MESH:D002545), hemorrhagic stroke (MESH:D000083302), cognitive decline (MESH:D003072), hypertensive (MESH:D006973), vascular brain injury (MESH:D020214), coagulation (MESH:D001778), ischemic stroke (MESH:D002544)
- **Chemicals:** glucosylceramide (MESH:D005963), ceramide (MESH:D002518), oPE (MESH:C005448), triacylglycerol (MESH:D014280), lipid (MESH:D008055), lysophosphatidylcholine (MESH:D008244), diacylglycerol (MESH:D004075), (O-acyl)-1-hydroxy fatty acid (-), phosphatidylethanolamine (MESH:C483858)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12963974/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12963974/full.md

## References

71 references — full list in the complete paper: https://tomesphere.com/paper/PMC12963974/full.md

---
Source: https://tomesphere.com/paper/PMC12963974