A Unified Three-Stage Machine Learning Framework for Diabetes Detection, Subtype Discrimination, and Cognitive-Metabolic Hypothesis Testing

Vishal Pandey; Ruzina Haque Laskar; Rishav Tewari

arXiv:2605.13464·cs.LG·May 14, 2026

A Unified Three-Stage Machine Learning Framework for Diabetes Detection, Subtype Discrimination, and Cognitive-Metabolic Hypothesis Testing

Vishal Pandey, Ruzina Haque Laskar, Rishav Tewari

PDF

TL;DR

This paper introduces a comprehensive three-stage machine learning framework for diabetes detection, subtype analysis, and cognitive association testing, emphasizing interpretability and reproducibility.

Contribution

It presents a novel, reproducible pipeline combining classification, clustering, and statistical analysis for diabetes research, including subtype discovery and cognitive association insights.

Findings

01

Supervised classifiers achieved ROC-AUC up to 0.825.

02

K-Means clustering identified plausible diabetes subtypes without labels.

03

Significant positive correlation between glycaemic control and cognitive function.

Abstract

Diabetes mellitus affects over 537 million adults worldwide and remains a major challenge in preventive healthcare. Existing machine-learning studies primarily formulate diabetes prediction as a binary classification problem, while subtype-oriented analysis and glycaemic-cognitive associations remain comparatively underexplored. We present a reproducible three-stage machine learning framework for diabetes detection, subtype-oriented clustering, and metabolic-cognitive association analysis. In Stage 1, five supervised classifiers together with a stacking ensemble are benchmarked on the NCSU Diabetes Dataset using stratified five-fold cross-validation and evaluation metrics including ROC-AUC, balanced accuracy, recall, and F1-score. SVM-RBF and Logistic Regression achieve the highest ROC-AUC ( $0.825 \pm 0.026$ ), while Random Forest achieves the highest accuracy ( $0.762 \pm 0.030$ ). SHAP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.