# A Cross-Ethnicity Validated Machine Learning Model for the Progression of Chronic Kidney Disease in Individuals over 50 Years Old

**Authors:** Langkun Wang, Wei Zhang, Xin Zhong, Peng Dou, Yuwei Wu, Xiaonan Zheng, Peng Zhang

PMC · DOI: 10.3390/jcm15020825 · 2026-01-20

## TL;DR

A machine learning model for predicting chronic kidney disease progression was developed and validated across different ethnic groups, showing strong performance and potential for personalized healthcare.

## Contribution

A cross-ethnicity validated machine learning model for CKD progression integrating novel composite health indicators.

## Key findings

- The XGBoost model achieved an AUC of 0.892 in training and maintained performance in external validation (AUC 0.867 in ELSA, 0.871 in HRS).
- The frailty index (FI) was identified as the most influential predictor using SHAP analysis.

## Abstract

Background/Objectives: Chronic Kidney Disease (CKD) is a global public health burden with a rising prevalence driven by population aging. Existing prediction models, such as the Kidney Failure Risk Equation (KFRE), often lack generalizability across ethnicities and comprehensive systemic indicators. This study aimed to develop and validate a machine learning model for predicting CKD progression by integrating traditional risk factors with novel composite indicators reflecting systemic health. Methods: Data from the China Health and Retirement Longitudinal Study (CHARLS, n = 2500) was used for model training. External validation was performed using independent cohorts from the English Longitudinal Study of Ageing (ELSA, n = 1200) and the Health and Retirement Study (HRS, n = 1500). Multiple machine learning algorithms, including XGBoost, were employed. Feature engineering incorporated composite indicators such as the frailty index (FI), triglyceride–glucose (TyG) index, and aggregate index of systemic inflammation (AISI). Results: The XGBoost model achieved an area under the curve (AUC) of 0.892 in the training set and maintained robust performance in external validation (AUC 0.867 in ELSA, 0.871 in HRS), outperforming the KFRE (AUC 0.745). SHAP analysis identified the FI as the most influential predictor. Decision curve analysis confirmed the model’s clinical utility. Conclusions: This machine learning model demonstrates high accuracy and cross-ethnicity validity, offering a practical tool for early intervention and personalized CKD management. Future work should address limitations such as the retrospective design and expand validation to underrepresented regions.

## Linked entities

- **Diseases:** Chronic Kidney Disease (MONDO:0005300)

## Full-text entities

- **Diseases:** Kidney Failure (MESH:D051437), systemic inflammation (MESH:D007249), CKD (MESH:D051436)
- **Chemicals:** triglyceride (MESH:D014280)

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12841880/full.md

---
Source: https://tomesphere.com/paper/PMC12841880