An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness
Ioannis Bilionis, Ricardo C. Berrios, Luis Fernandez-Luque, Carlos Castillo

TL;DR
This study empirically evaluates the risks associated with updating AI models in clinical settings, focusing on stability, arbitrariness, and fairness, using diabetes data to inform trustworthy decision support systems.
Contribution
It introduces a monitoring framework to detect risks from model updates, emphasizing the importance of continuous oversight for clinical AI reliability.
Findings
Model updates can cause prediction flips and instability.
Updates may increase arbitrariness and reduce fairness.
Continuous monitoring is crucial for trustworthy clinical AI.
Abstract
Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to changes in demographics, environment, or patient behaviors, model performance can degrade substantially. While updating models with new training data is necessary, such updates may also introduce new risks. We evaluated the proposed monitoring framework on four publicly available U.S.-based Type 1 Diabetes datasets containing high-resolution continuous glucose monitoring (CGM) data, comprising approximately 11,300 weekly observations from 496 participants under 20 years of age. All datasets included structured sociodemographic information. Using the prediction of severe hyperglycemia events in children with type 1 diabetes as a case study, we examine how different model update strategies can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
