# Enhanced diabetes prediction using CTGAN-MLP approach on body composition data

**Authors:** Javad Hassannataj Joloudari, Mohammad Maftoun, Mohammad Ali Nematollahi, Kandala N. V. P. S. Rajesh, S. Prasanth Vaidya, Kamireddy Rasool Reddy, Pirhossein Kolivand

PMC · DOI: 10.1038/s41598-025-31928-9 · Scientific Reports · 2025-12-10

## TL;DR

This study improves diabetes prediction by combining a generative model with a neural network to better handle imbalanced body composition data.

## Contribution

The novel CTGAN-MLP framework outperforms existing methods in predicting diabetes risk using synthetic data generation.

## Key findings

- The CTGAN-MLP model achieved 93.91% accuracy and 93.87% AUC in diabetes prediction.
- SHAP analysis revealed fat percentage and basal metabolic rate as key predictors.
- The model outperformed other evaluated methods in stratified cross-validation.

## Abstract

Accurate diabetes risk prediction is essential for timely intervention and effective disease management. To address these issues, this study evaluates a prediction framework that incorporates Conditional Tabular Generative Adversarial Network (CTGAN) to generate additional synthetic samples and mitigate class imbalance. Unlike interpolation-based oversampling techniques, CTGAN models the underlying data distribution and may better preserve nonlinear relationships among body composition variables. When combined with a Multilayer Perceptron (MLP), this approach enables the model to capture complex feature interactions that could be relevant for distinguishing individuals with diabetes from healthy participants. In our experiments, the CTGAN-augmented MLP achieved an accuracy of 93.91%, an AUC of 93.87%, a precision of 94.48%, and an F1-score of 93.89% under stratified 5-fold cross-validation, representing the highest performance among the evaluated models. The SHapley Additive exPlanations (SHAP) analysis was further employed to enhance interpretability and provided insight into the contribution of key predictors such as fat percentage, fat-free mass, and basal metabolic rate.

The online version contains supplementary material available at 10.1038/s41598-025-31928-9.

## Linked entities

- **Diseases:** diabetes (MONDO:0005015)

## Full-text entities

- **Diseases:** diabetes (MESH:D003920)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12808130/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12808130/full.md

## References

5 references — full list in the complete paper: https://tomesphere.com/paper/PMC12808130/full.md

---
Source: https://tomesphere.com/paper/PMC12808130