Optimal Ensemble Construction for Multi-Study Prediction with Applications to COVID-19 Excess Mortality Estimation
Gabriel Loewinger, Rolando Acosta Nunez, Rahul Mazumder, Giovanni, Parmigiani

TL;DR
This paper introduces an optimal ensemble construction method for multi-study prediction tasks, improving out-of-study generalization especially in heterogeneous biomedical datasets like COVID-19 mortality prediction.
Contribution
It proposes a joint estimation approach for ensemble weights and study-specific model parameters, unifying and extending existing multi-study stacking and pooling methods.
Findings
Outperforms standard methods in COVID-19 mortality prediction.
Improves prediction accuracy with limited data from new countries.
Remains competitive across various heterogeneity levels.
Abstract
It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets and applying standard statistical learning methods can result in poor out-of-study prediction performance when datasets are heterogeneous. Theoretical and applied work has shown to be a viable alternative that leverages the variability across datasets in a manner that promotes model generalizability. Multi-study ensembling uses a two-stage strategy which fits study-specific models and estimates ensemble weights separately. This approach ignores, however, the ensemble properties at the model-fitting stage, potentially resulting in a loss of efficiency. We therefore propose , an approach to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · COVID-19 diagnosis using AI · Artificial Intelligence in Healthcare
