A Unified Three-Stage Machine Learning Framework for Diabetes Detection, Subtype Discrimination, and Cognitive-Metabolic Hypothesis Testing
Vishal Pandey, Ruzina Haque Laskar, Rishav Tewari

TL;DR
This paper introduces a comprehensive three-stage machine learning framework for diabetes detection, subtype analysis, and cognitive association testing, emphasizing interpretability and reproducibility.
Contribution
It presents a novel, reproducible pipeline combining classification, clustering, and statistical analysis for diabetes research, including subtype discovery and cognitive association insights.
Findings
Supervised classifiers achieved ROC-AUC up to 0.825.
K-Means clustering identified plausible diabetes subtypes without labels.
Significant positive correlation between glycaemic control and cognitive function.
Abstract
Diabetes mellitus affects over 537 million adults worldwide and remains a major challenge in preventive healthcare. Existing machine-learning studies primarily formulate diabetes prediction as a binary classification problem, while subtype-oriented analysis and glycaemic-cognitive associations remain comparatively underexplored. We present a reproducible three-stage machine learning framework for diabetes detection, subtype-oriented clustering, and metabolic-cognitive association analysis. In Stage 1, five supervised classifiers together with a stacking ensemble are benchmarked on the NCSU Diabetes Dataset using stratified five-fold cross-validation and evaluation metrics including ROC-AUC, balanced accuracy, recall, and F1-score. SVM-RBF and Logistic Regression achieve the highest ROC-AUC (), while Random Forest achieves the highest accuracy (). SHAP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
