Predictive divergence in machine learning models for clinical mortality risk: A multicohort study of covid-19 patients
Júlia Chaves Neuenschwander Magalhães, Alexandre Dias Porto Chiavegatto Filho

TL;DR
This study shows that different machine learning models can give very different mortality risk predictions for the same patients, even when overall performance is similar.
Contribution
The study reveals that ML models with similar global performance can diverge in predictions across patient subgroups, emphasizing the need for subgroup-aware model evaluation.
Findings
ML models showed high overall performance (mean AUC of 0.855) but significant prediction divergence at the individual level (R² from 0.56 to 0.80).
Five patient subgroups had mortality rates ranging from 22% to 80%, with model performance varying significantly (F = 73.18, p < 0.001).
TabPFN and LightGBM performed best in the 'Anemia' subgroup, while TabPFN underperformed in the 'Immunodeficient' subgroup.
Abstract
Machine learning (ML) algorithms are increasingly used in healthcare to support clinical decision-making. While models with similar overall performance are often considered interchangeable for deployment, they may produce divergent predictions, a phenomenon known as algorithmic multiplicity. In such cases, the choice of algorithm may introduce bias. This study investigates the impacts of algorithmic multiplicity in mortality prediction and assesses the influence of patient characteristics on model decisions. A cohort of 4,337 adult patients (≥18 years) with RT-PCR–confirmed covid-19 from five tertiary care hospitals in Brazil was followed from March to August 2020. Five popular ML models for structured data were trained on demographic and laboratory data collected at early hospital admission to predict in-hospital mortality. Model performance, feature importance, and algorithmic…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI
