Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare

Elizabeth W. Miller; Jeffrey D. Blume

arXiv:2603.00192·cs.LG·April 16, 2026

Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare

Elizabeth W. Miller, Jeffrey D. Blume

PDF

TL;DR

This paper introduces diagnostics to measure and evaluate individual-level prediction stability in healthcare machine learning models, highlighting the importance of stability for clinical trust.

Contribution

It proposes a framework with two diagnostics, empirical prediction interval width and decision flip rate, to quantify individual prediction variability.

Findings

01

Randomness from optimization can cause variability comparable to data resampling.

02

Neural networks show greater instability than logistic regression.

03

Instability near decision thresholds can change treatment recommendations.

Abstract

In healthcare, predictive models increasingly inform patient-level decisions, yet little attention is paid to the variability in individual risk estimates and its impact on treatment decisions. For overparameterized models, now standard in machine learning, a substantial source of variability often goes undetected. Even when the data and model architecture are held fixed, randomness introduced by optimization and initialization can lead to materially different risk estimates for the same patient. This problem is largely obscured by standard evaluation practices, which rely on aggregate performance metrics (e.g., log-loss, accuracy) that are agnostic to individual-level stability. As a result, models with indistinguishable aggregate performance can nonetheless exhibit substantial procedural arbitrariness, which can undermine clinical trust. We propose an evaluation framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.