Fairness and Robustness of CLIP-Based Models for Chest X-rays
Th\'eo Sourget, David Restrepo, C\'eline Hudelot, Enzo Ferrante, Stergios Christodoulidis, Maria Vakalopoulou

TL;DR
This paper evaluates the fairness and robustness of CLIP-based models in chest X-ray classification, revealing performance disparities across patient subgroups and reliance on spurious correlations, highlighting areas for improvement in medical AI fairness.
Contribution
It provides a comprehensive assessment of CLIP-based models' fairness and robustness in radiology, including analysis of embeddings and identification of biases and reliance on shortcuts.
Findings
Performance gaps across age groups
Models rely on spurious features like chest drains
Embeddings reveal sensitive attribute information
Abstract
Motivated by the strong performance of CLIP-based models in natural image-text domains, recent efforts have adapted these architectures to medical tasks, particularly in radiology, where large paired datasets of images and reports, such as chest X-rays, are available. While these models have shown encouraging results in terms of accuracy and discriminative performance, their fairness and robustness in the different clinical tasks remain largely underexplored. In this study, we extensively evaluate six widely used CLIP-based models on chest X-ray classification using three publicly available datasets: MIMIC-CXR, NIH-CXR14, and NEATX. We assess the models fairness across six conditions and patient subgroups based on age, sex, and race. Additionally, we assess the robustness to shortcut learning by evaluating performance on pneumothorax cases with and without chest drains. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
