Machine Learning Framework for HbA1c Prediction: Data Enrichment, Cost Optimization, and Interpretability Through Stratified Regression and Multi-Stage Feature Selection
Mohamed Ezz, Majed Abdullah Alrowaily, Menwa Alshammeri, Alshaimaa A. Tantawy, Azzah Allahim, Ayman Mohamed Mostafa

TL;DR
This paper presents a machine learning model that predicts HbA1c levels using a small set of clinical features, offering a cost-effective and interpretable solution for large-scale health assessments.
Contribution
The study introduces a unified framework for continuous HbA1c prediction that integrates cost-efficient feature selection, stratified regression, and model explainability.
Findings
The optimal model achieved R2 = 0.7161 using only 40 selected features from 224 candidates.
Interpretability analysis showed clinically coherent relationships aligned with physiological expectations.
The framework reduces feature dependency and enables cost-efficient HbA1c estimation in resource-limited settings.
Abstract
Background: Measuring glycated hemoglobin (HbA1c) is essential for assessing long-term glycemic control, yet direct testing remains expensive and underutilized in many large-scale health surveys and resource-constrained settings. This study aims to (i) deliver a highly accurate and interpretable ML model for predicting HbA1c from routinely collected clinical, biochemical, and demographic data, (ii) reduce dependency on extensive laboratory panels by identifying a compact, cost-efficient subset of key predictors, and (iii) establish a transferable, explainable modeling framework applicable across chronic disease biomarkers. Unlike prior HbA1c prediction studies that focus primarily on classification or accuracy-driven models, this work introduces a unified framework for continuous HbA1c regression that jointly integrates cost-oriented feature parsimony, stratified regression validation,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Imbalanced Data Classification Techniques · Machine Learning in Healthcare
