Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval
Demetrio Deanda, Yuktha Priya Masupalli, Jeong Yang, Young Lee, Zechun, Cao, Gongbo Liang

TL;DR
This paper benchmarks the robustness of four contrastive learning models for medical image-report retrieval, revealing high sensitivity to data out-of-distribution and emphasizing the need for domain-specific training to improve reliability.
Contribution
It introduces an occlusion-based robustness benchmark for contrastive models in medical retrieval and compares their performance, highlighting the importance of domain-specific training data.
Findings
All models are highly sensitive to image occlusion.
MedCLIP shows slightly better robustness than others.
General-purpose trained CLIP performs poorly on medical data.
Abstract
Medical images and reports offer invaluable insights into patient health. The heterogeneity and complexity of these data hinder effective analysis. To bridge this gap, we investigate contrastive learning models for cross-domain retrieval, which associates medical images with their corresponding clinical reports. This study benchmarks the robustness of four state-of-the-art contrastive learning models: CLIP, CXR-RePaiR, MedCLIP, and CXR-CLIP. We introduce an occlusion retrieval task to evaluate model performance under varying levels of image corruption. Our findings reveal that all evaluated models are highly sensitive to out-of-distribution data, as evidenced by the proportional decrease in performance with increasing occlusion levels. While MedCLIP exhibits slightly more robustness, its overall performance remains significantly behind CXR-CLIP and CXR-RePaiR. CLIP, trained on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
MethodsContrastive Learning · Contrastive Language-Image Pre-training
