# Dual Machine Learning Framework for Predicting Long-Term Glycemic Change and Prediabetes Risk in Young Taiwanese Men

**Authors:** Chung-Chi Yang, Sheng-Tang Wu, Ta-Wei Chu, Chi-Hao Liu, Yung-Jen Chuang

PMC · DOI: 10.3390/diagnostics15192507 · Diagnostics · 2025-10-02

## TL;DR

This study uses machine learning to predict long-term blood sugar changes and prediabetes risk in young men from Taiwan, showing that baseline glucose levels are the strongest predictor.

## Contribution

A dual machine learning framework is proposed for predicting glycemic change and prediabetes risk with interpretable insights.

## Key findings

- Machine learning models outperformed regression in predicting long-term glucose changes.
- Baseline fasting glucose was the most important predictor, followed by body fat and blood lipid levels.
- The prediabetes classifier achieved high sensitivity and acceptable calibration for clinical use.

## Abstract

Background: Early detection of dysglycemia in young adults is important but underexplored. This study aimed to (1) predict long-term changes in fasting plasma glucose (δ-FPG) and (2) classify future prediabetes using complementary machine learning (ML) approaches. Methods: We analyzed 6247 Taiwanese men aged 18–35 years (mean follow-up 5.9 years). For δ-FPG (continuous outcome), random forest, stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and elastic net were compared with multiple linear regression using Symmetric mean absolute percentage error (SMAPE), Root mean squared error (RMSE), Relative absolute error(RAE), and Root relative squared error (RRSE) Sensitivity analyses excluded baseline FPG (FPGbase). Shapley additive explanations(SHAP) values provided interpretability, and stability was assessed across 10 repeated train–test cycles with confidence intervals. For prediabetes (binary outcome), an XGBoost classifier was trained on top predictors, with class imbalance corrected by SMOTE-Tomek. Calibration and decision-curve analysis (DCA) were also performed. Results: ML models consistently outperformed regression on all error metrics. FPGbase was the dominant predictor in full models (100% importance). Without FPGbase, key predictors included body fat, white blood cell count, age, thyroid-stimulating hormone, triglycerides, and low-density lipoprotein cholesterol. The prediabetes classifier achieved accuracy 0.788, precision 0.791, sensitivity 0.995, ROC-AUC 0.667, and PR-AUC 0.873. At a high-sensitivity threshold (0.2892), sensitivity reached 99.53% (specificity 47.46%); at a balanced threshold (0.5683), sensitivity was 88.69% and specificity was 90.61%. Calibration was acceptable (Brier 0.1754), and DCA indicated clinical utility. Conclusions: FPGbase is the strongest predictor of glycemic change, but adiposity, inflammation, thyroid status, and lipids remain informative. A dual interpretable ML framework offers clinically actionable tools for screening and risk stratification in young men.

## Linked entities

- **Diseases:** prediabetes (MONDO:0006920)
- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Diseases:** adiposity (MESH:D018205), Prediabetes (MESH:D011236), thyroid (MESH:D013966), inflammation (MESH:D007249)
- **Chemicals:** FPG (-), glucose (MESH:D005947), triglycerides (MESH:D014280), lipids (MESH:D008055)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12524205/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12524205/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/PMC12524205/full.md

---
Source: https://tomesphere.com/paper/PMC12524205