Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection
Conor K. Corbin, Michael Baiocchi, Jonathan H. Chen

TL;DR
This paper investigates how label selection biases affect performance estimates of clinical machine learning models and proposes methods to correct these biases for more accurate evaluation.
Contribution
It introduces a framework for understanding label selection bias, evaluates its impact through simulations, and proposes a combined randomization and weighting approach to recover true model performance.
Findings
Naive performance estimates can be significantly biased by label selection mechanisms.
Properly specified weighting estimators can recover true performance metrics.
A combined randomization and weighting deployment procedure improves performance estimation accuracy.
Abstract
When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Explainable Artificial Intelligence (XAI) · Statistical Methods and Bayesian Inference
