Investigating Data Interventions for Subgroup Fairness: An ICU Case Study
Erin Tan, Judy Hanwen Shen, Irene Y. Chen

TL;DR
This study examines how combining data sources affects subgroup fairness in healthcare machine learning models, revealing that data addition can both improve and impair fairness and performance.
Contribution
It highlights the limitations of data scaling for fairness, compares data-centric and model-based strategies, and emphasizes the importance of combining both approaches.
Findings
Data addition can both help and hurt model fairness and performance.
Many intuitive data selection strategies are unreliable.
Combining data and model calibration improves subgroup performance.
Abstract
In high-stakes settings where machine learning models are used to automate decision-making about individuals, the presence of algorithmic bias can exacerbate systemic harm to certain subgroups of people. These biases often stem from the underlying training data. In practice, interventions to "fix the data" depend on the actual additional data sources available -- where many are less than ideal. In these cases, the effects of data scaling on subgroup performance become volatile, as the improvements from increased sample size are counteracted by the introduction of distribution shifts in the training set. In this paper, we investigate the limitations of combining data sources to improve subgroup performance within the context of healthcare. Clinical models are commonly trained on datasets comprised of patient electronic health record (EHR) data from different hospitals or admission…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
