Multi-task Cross-modal Learning for Chest X-ray Image Retrieval
Zhaohui Liang, Sivaramakrishnan Rajaraman, Niccolo Marini, Zhiyun Xue, Sameer Antani

TL;DR
This paper introduces a multi-task learning framework to fine-tune BiomedCLIP, significantly improving its ability to perform clinically relevant chest X-ray image and report retrieval by enhancing semantic understanding and domain specificity.
Contribution
The study presents a novel multi-task training approach that adapts BiomedCLIP for better medical image-text retrieval, integrating multiple loss functions for improved clinical relevance.
Findings
Enhanced retrieval accuracy over baseline models
Clearer semantic clustering of normal and abnormal cases
Improved diagnostic sensitivity in visualizations
Abstract
CLIP and BiomedCLIP are examples of vision-language foundation models and offer strong cross-modal embeddings; however, they are not optimized for fine-grained medical retrieval tasks, such as retrieving clinically relevant radiology reports using chest X-ray (CXR) image queries. To address this shortcoming, we propose a multi-task learning framework to fine-tune BiomedCLIP and evaluate improvements to CXR image-text retrieval. Using BiomedCLIP as the backbone, we incorporate a lightweight MLP projector head trained with a multi-task composite loss function that includes: (1) a binary cross-entropy loss to distinguish normal from abnormal CXR studies, (2) a supervised contrastive loss to reinforce intra-class consistency, and (3) a CLIP loss to maintain cross-modal alignment. Experimental results demonstrate that the fine-tuned model achieves more balanced and clinically meaningful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
