An Improved Ensemble-Based Machine Learning Model with Feature Optimization for Early Diabetes Prediction
Md. Najmul Islam, Md. Miner Hossain Rimon, Shah Sadek-E-Akbor Shamim, Zarif Mohaimen Fahad, Md. Jehadul Islam Mony, Md. Jalal Uddin Chowdhury

TL;DR
This paper presents an enhanced ensemble machine learning framework with feature optimization for early diabetes prediction, achieving high accuracy and ROC-AUC on a large health survey dataset, and includes a mobile app for practical use.
Contribution
It introduces a novel ensemble approach with feature optimization for diabetes prediction and develops a mobile application for accessible health monitoring.
Findings
Achieved ROC-AUC of approximately 0.96 with individual models.
Stacking ensemble achieved 94.82% accuracy and ROC-AUC of 0.989.
Developed a mobile app for early diabetes risk assessment.
Abstract
Diabetes is a serious worldwide health issue, and successful intervention depends on early detection. However, overlapping risk factors and data asymmetry make prediction difficult. To use extensive health survey data to create a machine learning framework for diabetes classification that is both accurate and comprehensible, to produce results that will aid in clinical decision-making. Using the BRFSS dataset, we assessed a number of supervised learning techniques. SMOTE and Tomek Links were used to correct class imbalance. To improve prediction performance, both individual models and ensemble techniques such as stacking were investigated. The 2015 BRFSS dataset, which includes roughly 253,680 records with 22 numerical features, is used in this study. Strong ROC-AUC performance of approximately 0.96 was attained by the individual models Random Forest, XGBoost, CatBoost, and LightGBM.The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · Imbalanced Data Classification Techniques
