# Machine Learning Prediction of Progression to Dialysis in Patients With Polycystic Kidney Disease: Population-Based Retrospective Cohort Study

**Authors:** Cheng-Hao Chang, Mingchih Chen, Ming-Hsien Tsai, Yen-Chun Huang, Hung-Hsiang Liou, Ben-Chang Shia, Chingying Liang, Yu-Wei Fang

PMC · DOI: 10.2196/80343 · JMIR Medical Informatics · 2026-03-16

## TL;DR

This study uses machine learning to predict which patients with a genetic kidney disease will need dialysis, helping doctors monitor high-risk individuals more closely.

## Contribution

A novel machine learning model using administrative data to predict dialysis progression in autosomal dominant polycystic kidney disease patients.

## Key findings

- The XGBoost model achieved 98.3% accuracy in predicting dialysis progression.
- Age, anemia, and cardiovascular disease were top predictors of dialysis risk.
- Medication use served as proxies for disease complexity rather than direct risk factors.

## Abstract

Autosomal dominant polycystic kidney disease (ADPKD), characterized by progressive cyst growth and renal decline, is the leading genetic cause of end‐stage renal disease.

This study aims to develop and validate machine learning (ML) models for predicting the risk of progression to dialysis in patients with ADPKD using a nationwide administrative database. Early identification of high-risk patients is critical for timely monitoring.

This retrospective cohort study used data from Taiwan’s National Health Insurance Research Database (2007‐2018) to identify newly diagnosed patients with ADPKD. Six ML algorithms, including logistic regression, random forest, and extreme gradient boosting (XGBoost), were employed to predict progression to dialysis. Models were developed using 10-fold cross-validation, with the Synthetic Minority Oversampling Technique applied within training folds to address class imbalance. An ensemble-based feature selection strategy was implemented to identify the most robust predictors and optimize final model performance. Model evaluation was conducted using a strict temporal split.

The study included 1856 patients with ADPKD, of whom 302 (16.27%) progressed to dialysis. Multivariable Cox regression identified several significant risk factors, including age 66 years and older (hazard ratio [HR] 4.63, 95% CI 2.71‐7.92; P<.001), anemia (HR 4.33, 95% CI 3.25‐5.78; P<.001), congestive heart failure (HR 1.81, 95% CI 1.29‐2.54; P<.001), and acute kidney injury (HR 1.69, 95% CI 1.19‐2.41; P=.003). Among the ML models, the XGBoost model, using an optimized set of 27 features, demonstrated the highest predictive performance on the held-out temporal test set (accuracy 98.3%; area under the curve 0.955; F1-score 0.800; Brier score 0.022). The top predictors in the XGBoost model largely aligned with age, comorbidity burden, anemia, and cardiovascular disease markers. Medication use (eg, anticoagulants, loop diuretics, febuxostat) was also among the most influential predictors; however, medication-related predictors should be interpreted as proxies for disease complexity rather than direct risk modulators.

ML models can predict dialysis risk in patients with ADPKD using administrative data with temporal validation. This approach may support risk stratification by helping identify individuals at higher predicted risk who may warrant closer monitoring and further specialist evaluation.

## Linked entities

- **Chemicals:** febuxostat (PubChem CID 134018)
- **Diseases:** autosomal dominant polycystic kidney disease (MONDO:0004691), anemia (MONDO:0002280), congestive heart failure (MONDO:0005009), acute kidney injury (MONDO:0002492)

## Full-text entities

- **Genes:** AVPR2 (arginine vasopressin receptor 2) [NCBI Gene 554] {aka ADHR, DI1, DIR, DIR3, NDI, NDI1}, PKD1 (polycystin 1, transient receptor potential channel interacting) [NCBI Gene 5310] {aka PBP, PC1, Pc-1, TRPP1, eliosin}, DPP4 (dipeptidyl peptidase 4) [NCBI Gene 1803] {aka ADABP, ADCP2, CD26, DPPIV, TP103}, MTOR (mechanistic target of rapamycin kinase) [NCBI Gene 2475] {aka FRAP, FRAP1, FRAP2, RAFT1, RAPT1, SKS}, PKD2 (polycystin 2, transient receptor potential cation channel) [NCBI Gene 5311] {aka APKD2, PC2, PKD4, Pc-2, TRPP2}, REN (renin) [NCBI Gene 5972] {aka ADTKD4, HNFJ2, RTD}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** vascular disease (MESH:D014652), arrhythmia (MESH:D001145), kidney failure (MESH:D051437), Chronic inflammation (MESH:D007249), cyst growth (MESH:D006130), anxiety (MESH:D001007), intracranial aneurysms (MESH:D002532), liver cirrhosis (MESH:D008103), Hyperuricemia (MESH:D033461), chronic kidney disease (MESH:D051436), ischemic heart disease (MESH:D017202), CHC (MESH:D019698), Congestive heart failure (MESH:D006333), gout (MESH:D006073), asthma (MESH:D001249), Hypertension (MESH:D006973), nephrolithiasis (MESH:D053040), cardiorenal comorbidity (MESH:D059347), proteinuria (MESH:D011507), dementia (MESH:D003704), acute kidney injury (MESH:D058186), CKD (MESH:D012080), peptic ulcer (MESH:D010437), diverticulosis (MESH:D004240), atrial fibrillation (MESH:D001281), urinary tract infection (MESH:D014552), DM (MESH:D009223), ESRD (MESH:D007676), diabetes mellitus (MESH:D003920), frailty (MESH:D000073496), COPD (MESH:D029424), urate (MESH:C566013), ischemia (MESH:D007511), ADPKD (MESH:D016891), anemia (MESH:D000740), neuropsychiatric conditions (MESH:D001523), cardiovascular disease (MESH:D002318), CL (MESH:D002971), acute pancreatitis (MESH:D010195), renal decline (MESH:D006030), kidney damage (MESH:D007674), PKD (MESH:D007690), death (MESH:D003643), pneumonia (MESH:D011014), depression (MESH:D003866), cyst (MESH:D003560), Comorbidity (MESH:D004194), cholangitis (MESH:D002761), peripheral vascular disease (MESH:D016491), ischemic stroke (MESH:D002544), infection (MESH:D007239), bleeding (MESH:D006470), decline (MESH:D060825), dyslipidemia (MESH:D050171), metabolic acidosis (MESH:D000138), hemorrhagic stroke (MESH:D000083302), hematuria (MESH:D006417), glaucoma (MESH:D005901), Catastrophic Illness (MESH:D002388)
- **Chemicals:** vitamin K (MESH:D014812), metformin (MESH:D008687), sodium bicarbonate (MESH:D017693), aldosterone (MESH:D000450), MC (MESH:C061001), benzbromarone (MESH:D001553), urate (MESH:D014527), insulins (MESH:D061385), hydralazine (MESH:D006830), long-acting insulin (MESH:D049528), methyldopa (MESH:D008750), thiazolidinediones (MESH:D045162), clonidine (MESH:D003000), Tranexamic acid (MESH:D014148), sodium (MESH:D012964), fenofibrate (MESH:D011345), minoxidil (MESH:D008914), potassium (MESH:D011188), tolvaptan (MESH:D000077602), rapid-acting insulin (MESH:D061266), sulfonylureas (MESH:D013453), febuxostat (MESH:D000069465), allopurinol (MESH:D000493), -blockers (-), cholesterol (MESH:D002784)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12991194/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12991194/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12991194/full.md

---
Source: https://tomesphere.com/paper/PMC12991194