Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center Open Data Commons
Heather M. Whitney, Natalie Baughan, Kyle J. Myers, Karen Drukker,, Judy Gichoya, Brad Bower, Weijie Chen, Nicholas Gruszauskas, Jayashree, Kalpathy-Cramer, Sanmi Koyejo, Rui C. S\'a, Berkman Sahiner, Zi Zhang,, Maryellen L. Giger

TL;DR
This study evaluates how well the demographic makeup of the MIDRC open medical imaging dataset reflects the US population and COVID-19 cases over time, highlighting the importance of representativeness for fair AI development.
Contribution
It introduces a longitudinal assessment method using Jensen Shannon distance to measure demographic representativeness in open medical imaging data.
Findings
Demographic representativeness varies by ethnicity and race due to data reporting gaps.
Sex and race distributions have remained stable over time.
Metrics like JSD are useful for tracking dataset representativeness.
Abstract
Purpose: The Medical Imaging and Data Resource Center (MIDRC) open data commons was launched to accelerate the development of artificial intelligence (AI) algorithms to help address the COVID-19 pandemic. The purpose of this study was to quantify longitudinal representativeness of the demographic characteristics of the primary imaging dataset compared to the United States general population (US Census) and COVID-19 positive case counts from the Centers for Disease Control and Prevention (CDC). Approach: The Jensen Shannon distance (JSD) was used to longitudinally measure the similarity of the distribution of (1) all unique patients in the MIDRC data to the 2020 US Census and (2) all unique COVID-19 positive patients in the MIDRC data to the case counts reported by the CDC. The distributions were evaluated in the demographic categories of age at index, sex, race, ethnicity, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 and healthcare impacts · Artificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI
