# Interpretable machine learning model based on routine metabolic laboratory indices to identify advanced chronic kidney disease

**Authors:** Baoye Ye, Xikui Zhang, Weikun Zhu, Zhongfu Xiao, Shuhui Huang

PMC · DOI: 10.3389/fendo.2026.1776419 · Frontiers in Endocrinology · 2026-03-18

## TL;DR

This study develops an interpretable machine learning model using routine metabolic lab data to accurately identify advanced chronic kidney disease.

## Contribution

The novel contribution is an interpretable Gradient Boosting model using metabolic indices for advanced CKD detection without requiring albuminuria.

## Key findings

- The Gradient Boosting model achieved high discrimination (AUC = 0.972 internally; 0.965 externally) for advanced CKD detection.
- Key metabolic predictors included urea, phosphorus, albumin, and lipid-related parameters.
- The model shows potential for integration into electronic health records for risk stratification in specialist care.

## Abstract

Early identification of advanced chronic kidney disease (CKD), a condition accompanied by profound metabolic and endocrine disturbances, is essential for timely nephrology referral and intervention. However, widely used risk equations often require albuminuria or repeated measurements that are not consistently available in routine clinical practice.

We retrospectively analyzed adult patients from three different departments affiliated to one university, including two independent hospitals and a clinic department. Routinely collected demographic, clinical, and metabolic laboratory variables were used to develop machine learning models for distinguishing preserved kidney function (CKD G1–2) from advanced stages (G3a–5). Five algorithms were trained and internally validated in a development cohort, followed by external validation in an independent cohort. Model performance was assessed by discrimination, calibration, and interpretability using feature importance and SHAP (Shapley Additive Explanations).

Among 308 patients in the development cohort and 52 in the external cohort, the Gradient Boosting classifier achieved the best discrimination (AUC = 0.972 internally; 0.965 externally) with good calibration. Urea, kidney disease type, phosphorus, albumin, and lipid-related parameters–reflecting systemic metabolic dysregulation–emerged as key contributors to model predictions.

An interpretable Gradient Boosting model leveraging routinely measured metabolic laboratory data accurately identifies advanced CKD and captures clinically meaningful metabolic patterns associated with disease severity, supporting its potential integration into electronic health records for risk stratification and identification of advanced CKD among patients with established CKD in specialist care.

## Linked entities

- **Chemicals:** urea (PubChem CID 1176), phosphorus (PubChem CID 139579)
- **Diseases:** chronic kidney disease (MONDO:0005300)

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** albuminuria (MESH:D000419), CKD (MESH:D051436), Urea (MESH:D056806), metabolic (MESH:D008659), endocrine disturbances (MESH:D004700), kidney disease (MESH:D007674)
- **Chemicals:** phosphorus (MESH:D010758), lipid (MESH:D008055)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13038529/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13038529/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/PMC13038529/full.md

---
Source: https://tomesphere.com/paper/PMC13038529