Data (in)equities in data science: Dissecting systemic and systematic biases in pulse oximetry
Lillian Rountree, Harsh Parikh, Bhramar Mukherjee

TL;DR
This paper operationalizes data equity concepts in data science, using pulse oximetry as a case study to identify and address systemic biases affecting health disparities.
Contribution
It translates abstract data equity principles into precise, testable statistical formulations and demonstrates their application in analyzing racial disparities in pulse oximetry.
Findings
Identified how information bias impacts health disparities.
Provided a systematic framework for diagnosing sources of outcome disparities.
Highlighted the distinct roles of prediction, decision, and data equity.
Abstract
Data equity is an emerging framework for responsible data science. However, its core concepts, including fairness, representativeness, and information bias, remain largely abstract and general, lacking the mathematical specificity needed for practical implementation. In this paper, we demonstrate how statisticians can operationalize data equity by translating its tenets into precise, testable formulations tailored to a given problem. Using the well-documented case of differential measurement error across racial groups in pulse oximetry, we first adopt an oracle approach, tracing how a single upstream violation of information bias compounds through the analytic pipeline into treatment disparities, fairness violations, and adverse health outcomes. We then demonstrate the inverse: starting from an observed outcome disparity, the data equity framework provides a principled structure for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
