On Arbitrary Predictions from Equally Valid Models
Sarah Lockfisch, Kristian Schwethelm, Martin Menten, Rickmer Braren, Daniel Rueckert, Alexander Ziller, Georgios Kaissis

TL;DR
This paper investigates the phenomenon of predictive multiplicity in medical models, showing that multiple equally valid models can produce different predictions, and demonstrates that small ensembles can mitigate this issue to improve diagnostic reliability.
Contribution
It provides an empirical analysis of predictive multiplicity in medical models, highlighting its causes, consequences, and proposing ensemble strategies to address it.
Findings
Standard validation metrics do not identify a unique optimal model.
Predictive multiplicity can lead to arbitrary patient diagnoses.
Small ensembles with abstention strategies effectively reduce predictive multiplicity.
Abstract
Model multiplicity refers to the existence of multiple machine learning models that describe the data equally well but may produce different predictions on individual samples. In medicine, these models can admit conflicting predictions for the same patient -- a risk that is poorly understood and insufficiently addressed. In this study, we empirically analyze the extent, drivers, and ramifications of predictive multiplicity across diverse medical tasks and model architectures, and show that even small ensembles can mitigate/eliminate predictive multiplicity in practice. Our analysis reveals that (1) standard validation metrics fail to identify a uniquely optimal model and (2) a substantial amount of predictions hinges on arbitrary choices made during model development. Using multiple models instead of a single model reveals instances where predictions differ across equally plausible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
