Multi-task Cross-modal Learning for Chest X-ray Image Retrieval

Zhaohui Liang; Sivaramakrishnan Rajaraman; Niccolo Marini; Zhiyun Xue; Sameer Antani

arXiv:2601.05399·cs.CV·January 12, 2026

Multi-task Cross-modal Learning for Chest X-ray Image Retrieval

Zhaohui Liang, Sivaramakrishnan Rajaraman, Niccolo Marini, Zhiyun Xue, Sameer Antani

PDF

Open Access

TL;DR

This paper introduces a multi-task learning framework to fine-tune BiomedCLIP, significantly improving its ability to perform clinically relevant chest X-ray image and report retrieval by enhancing semantic understanding and domain specificity.

Contribution

The study presents a novel multi-task training approach that adapts BiomedCLIP for better medical image-text retrieval, integrating multiple loss functions for improved clinical relevance.

Findings

01

Enhanced retrieval accuracy over baseline models

02

Clearer semantic clustering of normal and abnormal cases

03

Improved diagnostic sensitivity in visualizations

Abstract

CLIP and BiomedCLIP are examples of vision-language foundation models and offer strong cross-modal embeddings; however, they are not optimized for fine-grained medical retrieval tasks, such as retrieving clinically relevant radiology reports using chest X-ray (CXR) image queries. To address this shortcoming, we propose a multi-task learning framework to fine-tune BiomedCLIP and evaluate improvements to CXR image-text retrieval. Using BiomedCLIP as the backbone, we incorporate a lightweight MLP projector head trained with a multi-task composite loss function that includes: (1) a binary cross-entropy loss to distinguish normal from abnormal CXR studies, (2) a supervised contrastive loss to reinforce intra-class consistency, and (3) a CLIP loss to maintain cross-modal alignment. Experimental results demonstrate that the fine-tuned model achieves more balanced and clinically meaningful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning