Holdouts set for safe predictive model updating
Sami Haidar-Wehbe, Samuel R Emerson, Louis J M Aslett, James Liley

TL;DR
This paper introduces a holdout set approach for updating predictive risk scores to maintain accuracy over time without bias, demonstrating its effectiveness with a pre-eclampsia risk score example.
Contribution
It proposes a method for selecting an optimal holdout set size to improve risk score updates, with algorithms for estimating this size and practical application to pre-eclampsia risk prediction.
Findings
Optimal holdout size around 10,000 individuals for pre-eclampsia score
Method reduces adverse outcomes asymptotically to the best possible level
Algorithms for estimating holdout size are effective and practical
Abstract
Predictive risk scores for adverse outcomes are increasingly crucial in guiding health interventions. Such scores may need to be periodically updated due to change in the distributions they model. However, directly updating risk scores used to guide intervention can lead to biased risk estimates. To address this, we propose updating using a `holdout set' - a subset of the population that does not receive interventions guided by the risk score. Balancing the holdout set size is essential to ensure good performance of the updated risk score whilst minimising the number of held out samples. We prove that this approach reduces adverse outcome frequency to an asymptotically optimal level and argue that often there is no competitive alternative. We describe conditions under which an optimal holdout size (OHS) can be readily identified, and introduce parametric and semi-parametric algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Health Systems, Economic Evaluations, Quality of Life
