# A machine learning-based prediction of diabetic retinopathy using the Korea national health and nutrition examination survey (2008–2012, 2017–2021)

**Authors:** Min Seok Kim, Young Wook Choi, Borghare Shubham Prakash, Youngju Lee, Soo Lim, Se Joon Woo

PMC · DOI: 10.3389/fmed.2025.1542860 · Frontiers in Medicine · 2025-05-30

## TL;DR

This study uses machine learning to predict diabetic retinopathy in diabetic patients using health data, without needing eye imaging.

## Contribution

Developed and compared ML models for DR prediction using non-imaging clinical data, achieving reliable performance.

## Key findings

- Random forest model achieved highest AUC of 0.748 for DR prediction.
- Key predictors included HbA1c, fasting glucose, diabetes duration, and BMI.
- Model performance was reliable without fundus imaging, aiding early detection.

## Abstract

Machine learning technology that uses available clinical data to predict diabetic retinopathy (DR) can be highly valuable in medical settings where fundus cameras are not accessible.

This study aimed to develop and compare machine learning algorithms for predicting DR without fundus image.

We used data from Korea National Health and Nutrition Examination Survey (2008–2012 and 2017–2021) and enrolled individuals aged ≥ 20 years with diabetes who received fundus examination. Predictive models for DR were developed using logistic regression and three machine learning algorithms: extreme gradient boosting, decision tree, and random forest. Model performance was evaluated using area under the receiver operating characteristic curve (AUC) and accuracy for the diagnosis of DR, and feature importance was determined using Shapley Additive Explanations (SHAP).

Among the 3,026 diabetic participants (male, 50.7%; mean age, 63.7 ± 10.5 years), 671 (22.2%) had DR. The random forest model, using 16 variables, achieved the highest AUC of 0.748 (95% confidence interval, 0.705–0.790) with a sensitivity 0.669, specificity of 0.729 and an accuracy of 0.715. As interpreted by SHAP, HbA1c, fasting glucose levels, duration of diabetes, and body mass index were identified as common key determinants influencing the model’s outcomes.

The DR prediction models using machine learning techniques demonstrated reliable performance even without fundus imaging, with the random forest model showing particularly strong results. These models could assist in managing DR by identifying high-risk patients, enabling timely ophthalmic referrals.

## Linked entities

- **Diseases:** diabetic retinopathy (MONDO:0005266), diabetes (MONDO:0005015)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12163237/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12163237/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12163237/full.md

---
Source: https://tomesphere.com/paper/PMC12163237