# Artificial intelligence survival models for identifying relevant risk factors for incident diabetes in Azar cohort population

**Authors:** Neda Gilani, Mohammadhossein Somi, Farzaneh Hamidi, Pasqualina Santaguida, Elnaz Faramarzi, Reza Arabi Belaghi

PMC · DOI: 10.34172/hpp.025.43105 · Health Promotion Perspectives · 2025-05-06

## TL;DR

This study uses AI survival models to identify risk factors for type II diabetes in a population from East Azerbaijan, Iran.

## Contribution

The novel use of AI survival models, specifically random forest and LASSO-Cox regression, to identify diabetes risk factors in a specific Iranian cohort.

## Key findings

- WC, MCHC, and hypertension were consistently identified as key risk factors for diabetes.
- The random forest model showed slightly better accuracy with a C-index of 79.5%.
- 21 variables were identified as important predictors by the random forest analysis.

## Abstract

This study aimed to identify some risk factors associated with time to diabetes type II events using artificial intelligence (AI) survival models (SM) in a population cohort from East Azerbaijan, Iran.

Data from Azar-Cohort spanning from 2014 to 2020 was analyzed using the random forest (RF) variable selection method along with Cox regression to identify the most relevant risk factors associated with diabetes. We then developed prediction models using RF survival analysis. Lasso-variable selection and RF variable selection were used to select the most important variables. The concordance index (C-index) was used to evaluate the concordance of the prediction models.

Our LASSO-Cox regression identified six factors to be significantly associated with diabetes: age, mean corpuscular hemoglobin concentration (MCHC), waist circumference (WC), body mass index (BMI), use of sleep medication, and hypertension stage 1 and stage 2. The model included all variables with a C-index of 76.3%. In contrast, the RF analysis identified 21 important variables predicting a higher probability of having diabetes. Of those, WC, MCHC, triglyceride, and age were the most important predictors of diabetes. The RF model converged after 500 trees with an out-of-bag (OOB) of 0.28 and a C-index of 79.5%.

RF machine learning algorithms and LASSO-Cox regression analyses consistently identified WC, hypertension, and MCHC as the main risk factors for developing diabetes. The RF approach demonstrated slightly better accuracy in predicting the likelihood of diabetes at different time points.

## Linked entities

- **Diseases:** diabetes (MONDO:0005015)

## Full-text entities

- **Diseases:** hypertension (MESH:D006973), diabetes (MESH:D003920), diabetes type II (MESH:D003924)
- **Chemicals:** triglyceride (MESH:D014280)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12125507/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12125507/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12125507/full.md

---
Source: https://tomesphere.com/paper/PMC12125507