# Early Detection of Chronic Kidney Disease in Men Using Lifestyle and Demographic Indicators: A Machine Learning Approach for Primary Healthcare Settings

**Authors:** Mc Neil Valencia, Jun Kim, Zeeshan Abbas, Seung Won Lee

PMC · DOI: 10.3390/healthcare14030405 · Healthcare · 2026-02-05

## TL;DR

This study uses machine learning to predict early chronic kidney disease in men by combining lifestyle, demographic, and health data, aiming to support early prevention and personalized care.

## Contribution

The novel contribution is an explainable machine learning framework for CKD risk prediction in middle-aged men using public health survey data.

## Key findings

- AdaBoost achieved the best performance with an accuracy of 0.7258 and F1-score of 0.6457.
- Serum creatinine, blood urea nitrogen, urinary creatinine, and age were identified as major predictors of CKD.
- Lifestyle factors like BMI, sodium and sugar intake, and sleep duration were found to be secondary predictors.

## Abstract

Background/Objective: Chronic kidney disease (CKD) is a major global health concern associated with significant morbidity, mortality, and healthcare burden. This study aimed to develop an explainable machine learning framework that integrates lifestyle, sociodemographic, and biochemical factors for early CKD risk prediction among middle-aged men using public health survey data. Methods: Data from 968 male participants were preprocessed by removing missing values, deriving eGFR and ACR, and labeling CKD status. Five machine learning algorithms, (i.e., Random Forest, AdaBoost, Naïve Bayes, SVM, and XGBoost) were trained and evaluated using accuracy, precision, recall, and F1-score. Model interpretability was assessed using SHAP, LIME, Boruta, and Pearson’s correlation analyses. Results: AdaBoost yielded the best performance (accuracy = 0.7258, F1 = 0.6457, recall = 0.6923), with robust generalization confirmed by the precision–recall curve (AP = 0.715). SHAP and LIME revealed that serum creatinine, blood urea nitrogen, urinary creatinine, and age were major predictors, whereas lifestyle and metabolic indicators such as BMI, sodium and sugar intake, and sleep duration emerged as secondary factors for CKD. Conclusions: This study demonstrates the effectiveness of an explainable machine learning model that integrates lifestyle, sociodemographic and biochemical data for early CKD prediction among middle-aged men. The AdaBoost-based framework shows strong potential for implementation as a clinical decision-support tool within EHR systems and may contribute to personalized and preventive interventions. It emphasizes the growing importance of modifiable behaviors in kidney disease development and supports future work involving multiple cohorts and temporal model expansion to improve risk stratification for individuals at risk of kidney disease.

## Linked entities

- **Diseases:** chronic kidney disease (MONDO:0005300)

## Full-text entities

- **Diseases:** kidney disease (MESH:D007674), CKD (MESH:D051436)
- **Chemicals:** creatinine (MESH:D003404), sodium (MESH:D012964), sugar (MESH:D000073893)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12897370/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12897370/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/PMC12897370/full.md

---
Source: https://tomesphere.com/paper/PMC12897370