Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-rays
Ethan Dack, Chengliang Dai

TL;DR
This paper investigates dataset biases in open-source chest X-ray datasets by applying classification tasks, transformations, and various neural network architectures to assess whether biases exist and how they affect AI medical imaging research.
Contribution
The study systematically analyzes dataset biases in popular chest X-ray datasets using multiple models and transformations, highlighting the need for more explainable and unbiased medical imaging datasets.
Findings
Biases are detectable in open-source chest X-ray datasets.
Transformations influence the detectability of dataset biases.
Different neural network architectures reveal varying levels of bias sensitivity.
Abstract
Recent works have revisited the infamous task ``Name That Dataset'', demonstrating that non-medical datasets contain underlying biases and that the dataset origin task can be solved with high accuracy. In this work, we revisit the same task applied to popular open-source chest X-ray datasets. Medical images are naturally more difficult to release for open-source due to their sensitive nature, which has led to certain open-source datasets being extremely popular for research purposes. By performing the same task, we wish to explore whether dataset bias also exists in these datasets. To extend our work, we apply simple transformations to the datasets, repeat the same task, and perform an analysis to identify and explain any detected biases. Given the importance of AI applications in medical imaging, it's vital to establish whether modern methods are taking shortcuts or are focused on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · AI in cancer detection · Radiomics and Machine Learning in Medical Imaging
