# Use of Automated Machine Learning to Detect Undiagnosed Diabetes in US Adults: Development and Validation Study

**Authors:** Jianxiu Liu, Fred Ssewamala, Ruopeng An, Mengmeng Ji

PMC · DOI: 10.2196/68260 · JMIR AI · 2025-10-08

## TL;DR

This study shows that automated machine learning can effectively detect undiagnosed diabetes in U.S. adults using self-reported and health data.

## Contribution

This is the first study to apply AutoML for detecting undiagnosed diabetes in U.S. adults.

## Key findings

- The AutoML model outperformed four traditional machine learning models in detecting undiagnosed diabetes.
- The model achieved an area under the ROC curve of 0.909, with high specificity and negative predictive value.

## Abstract

Early diagnosis of diabetes is essential for early interventions to slow the progression of dysglycemia and its comorbidities. However, among individuals with diabetes, about 23% were unaware of their condition.

This study aims to investigate the potential use of automated machine learning (AutoML) models and self-reported data in detecting undiagnosed diabetes among US adults.

Individual-level data, including biochemical tests for diabetes, demographic characteristics, family history of diabetes, anthropometric measures, dietary intakes, health behaviors, and chronic conditions, were retrieved from the National Health and Nutrition Examination Survey, 1999‐2020. Undiagnosed diabetes was defined as having no prior self-reported diagnosis but meeting diagnostic criteria for elevated hemoglobin A1c, fasting plasma glucose, or 2-hour plasma glucose. The H2O AutoML framework, which allows for automated hyperparameter tuning, model selection, and ensemble learning, was used to automate the machine learning workflow. For comparative analysis, 4 traditional machine learning models—logistic regression, support vector machines, random forest, and extreme gradient boosting—were implemented. Model performance was evaluated using the area under the receiver operating characteristic curve.

The study included 11,815 participants aged 20 years and older, comprising 2256 patients with undiagnosed diabetes and 9559 without diabetes. The average age was 59.76 (SD 15.0) years for participants with undiagnosed diabetes and 46.78 (SD 17.2) years for those without diabetes. The AutoML model demonstrated superior performance compared with the 4 traditional machine learning models. The trained AutoML model achieved an area under the receiver operating characteristic curve of 0.909 (95% CI 0.897-0.921) in the test set. The model demonstrated a sensitivity of 70.26%, specificity of 90.46%, positive predictive value of 64.10%, and negative predictive value of 92.61% for identifying undiagnosed diabetes from nondiabetes.

To our knowledge, this study is the first to utilize the AutoML model for detecting undiagnosed diabetes in US adults. The model’s strong performance and applicability to the broader US population make it a promising tool for large-scale diabetes screening efforts.

## Linked entities

- **Diseases:** diabetes (MONDO:0005015)

## Full-text entities

- **Diseases:** Diabetes (MESH:D003920)
- **Chemicals:** glucose (MESH:D005947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12532270/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12532270/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/PMC12532270/full.md

---
Source: https://tomesphere.com/paper/PMC12532270