Advancing Tabular Stroke Modelling Through a Novel Hybrid Architecture and Feature-Selection Synergy
Yousuf Islam, Md. Jalal Uddin Chowdhury, Sumon Chandra Das

TL;DR
This study develops a highly accurate, interpretable machine learning framework for stroke prediction using routine demographic and clinical data, significantly surpassing previous models' performance.
Contribution
The paper introduces a novel hybrid architecture combined with feature selection techniques, achieving near-clinical accuracy in tabular stroke prediction.
Findings
Achieved 97.2% accuracy and 97.15% F1-score in stroke prediction.
Rigorous preprocessing and hybrid modeling significantly improved performance.
Outperformed individual models like LightGBM by a large margin.
Abstract
Brain stroke remains one of the principal causes of death and disability worldwide, yet most tabular-data prediction models still hover below the 95% accuracy threshold, limiting real-world utility. Addressing this gap, the present work develops and validates a completely data-driven and interpretable machine-learning framework designed to predict strokes using ten routinely gathered demographic, lifestyle, and clinical variables sourced from a public cohort of 4,981 records. We employ a detailed exploratory data analysis (EDA) to understand the dataset's structure and distribution, followed by rigorous data preprocessing, including handling missing values, outlier removal, and class imbalance correction using Synthetic Minority Over-sampling Technique (SMOTE). To streamline feature selection, point-biserial correlation and random-forest Gini importance were utilized, and ten varied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcute Ischemic Stroke Management · Imbalanced Data Classification Techniques · Machine Learning in Healthcare
MethodsLogistic Regression
