TL;DR
This study empirically evaluates methods like distributionally robust optimization to improve worst-case predictive performance across patient subpopulations, finding standard methods often outperform specialized approaches in clinical data settings.
Contribution
It provides a large-scale empirical comparison of DRO and standard learning methods for clinical outcome prediction, introducing an extension to DRO for customizable worst-case metrics.
Findings
Standard approaches generally outperform DRO in worst-case subpopulation performance.
Few methods significantly improve disaggregated performance over subpopulations.
Improving worst-case performance may require better data collection rather than algorithmic changes.
Abstract
Predictive models for clinical outcomes that are accurate on average in a patient population may underperform drastically for some subpopulations, potentially introducing or reinforcing inequities in care access and quality. Model training approaches that aim to maximize worst-case model performance across subpopulations, such as distributionally robust optimization (DRO), attempt to address this problem without introducing additional harms. We conduct a large-scale empirical study of DRO and several variations of standard learning procedures to identify approaches for model development and selection that consistently improve disaggregated and worst-case performance over subpopulations compared to standard approaches for learning predictive models from electronic health records data. In the course of our evaluation, we introduce an extension to DRO approaches that allows for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
