Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study

Michael O. Eniolade

arXiv:2605.21566·cs.LG·May 22, 2026

Calibration, Uncertainty Communication, and Deployment Readiness in CKD Risk Prediction: A Framework Evaluation Study

Michael O. Eniolade

PDF

TL;DR

This study evaluates calibration and uncertainty quantification in CKD risk prediction models, revealing that high internal accuracy does not ensure reliable external deployment readiness, emphasizing the need for external validation.

Contribution

It provides a comprehensive framework for assessing calibration, uncertainty, and deployment readiness of machine learning models in clinical settings, highlighting the gap between internal and external performance.

Findings

01

All models achieved AUROC 1.00 internally but performed poorly externally.

02

Calibration improved with isotonic regression, but external calibration remained unstable.

03

Conformal coverage dropped significantly under external data shift.

Abstract

Machine learning models for chronic kidney disease (CKD) risk prediction often post strong discrimination scores on internal test sets. Calibration and uncertainty quantification get far less attention, leaving clinicians without reliable information about whether the probability outputs are accurate. We trained five classifiers on the UCI CKD dataset (400 patients, 62.5% CKD prevalence): logistic regression, random forest, XGBoost, SVM with Platt scaling, and Gaussian naive Bayes. We evaluated each across calibration quality, conformal prediction coverage, and an eight-criterion deployment readiness framework. A distributional stress-test applied the best-calibrated variant of each model to the open-access MIMIC-IV demo cohort (97 patients, 23.7% CKD) to assess behaviour under prevalence shift and feature missingness. We measured calibration before and after Platt scaling and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.