# Machine learning for predicting CKD stages in patients with autosomal dominant polycystic kidney disease: a nationwide cohort study in Japan

**Authors:** Yosuke Shimada, Hiroshi Kataoka, Saori Nishio, Junichi Hoshino, Keiju Hiromura, Yoshitaka Isaka, Satoru Muto

PMC · DOI: 10.1038/s41598-026-39885-7 · 2026-02-13

## TL;DR

This study uses machine learning to predict CKD stages in patients with ADPKD, identifying key factors like eGFR and kidney volume.

## Contribution

The novel use of random forest machine learning to predict CKD progression in ADPKD patients using a nationwide Japanese cohort.

## Key findings

- Random forest outperformed other models in predicting CKD stages in ADPKD patients.
- Key predictors included eGFR, serum creatinine, and total kidney volume.

## Abstract

Machine learning (ML) is a valuable tool in healthcare, enabling the prediction of disease progression through data-driven regression and nonlinear modeling. Unlike traditional statistical methods, ML can identify complex interactions among explanatory variables. Autosomal dominant polycystic kidney disease (ADPKD) is a common cause of chronic kidney disease (CKD), often progressing to end-stage renal failure. Accurately predicting CKD progression in ADPKD patients is essential for personalized treatment strategies. This study analyzed data from 2,737 patients with ADPKD enrolled in the Japanese Nationwide Cohort. Using this dataset, we developed ML models to predict CKD stages. Feature importance analysis was performed to identify key predictive variables. Three ML models—random forest, support vector machine, and naïve Bayes—were evaluated for their predictive accuracy. Random forest exhibited the highest predictive accuracy among the models tested. Feature importance analysis identified estimated glomerular filtration rate (eGFR), serum creatinine, CKD heat map, urinary protein, and total kidney volume as the most significant predictors of CKD stage. As a nonlinear model, random forest effectively captured complex interactions between variables, outperforming the linear support vector machine. The naïve Bayes model, despite assuming independence among variables, surpassed the linear model, indicating limited interdependence among some predictors. ML, particularly random forest, provides a robust approach for predicting CKD stages in patients with ADPKD by accounting for nonlinear variable relationships. These findings emphasize ML’s potential in personalized CKD management and highlight the need for individualized treatment approaches.

## Linked entities

- **Diseases:** autosomal dominant polycystic kidney disease (MONDO:0004691), chronic kidney disease (MONDO:0005300), end-stage renal failure (MONDO:0004375)

## Full-text entities

- **Diseases:** autosomal dominant polycystic kidney disease (MESH:D016891), CKD (MESH:D012080)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12982517/full.md

---
Source: https://tomesphere.com/paper/PMC12982517