Instability in clinical risk stratification models using deep learning
Daniel Lopez-Martinez, Alex Yakubovich, Martin Seneviratne, Adam D., Lelkes, Akshit Tyagi, Jonas Kemp, Ethan Steinberg, N. Lance Downing, Ron C., Li, Keith E. Morse, Nigam H. Shah, Ming-Jun Chen

TL;DR
This paper investigates the instability of deep learning models in healthcare, revealing that repeated training can lead to patient-level outcome variability despite stable overall performance, and proposes metrics and strategies to improve stability.
Contribution
It introduces new stability metrics and mitigation strategies for deep learning models trained on electronic health records in clinical risk prediction.
Findings
Repeated training yields patient-level outcome variability.
Global performance metrics remain stable despite instability.
Proposed metrics effectively measure training randomness effects.
Abstract
While it has been well known in the ML community that deep learning models suffer from instability, the consequences for healthcare deployments are under characterised. We study the stability of different model architectures trained on electronic health records, using a set of outpatient prediction tasks as a case study. We show that repeated training runs of the same deep learning model on the same training data can result in significantly different outcomes at a patient level even though global performance metrics remain stable. We propose two stability metrics for measuring the effect of randomness of model training, as well as mitigation strategies for improving model stability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Medical Coding and Health Information · Chronic Disease Management Strategies
