UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for   Diverse Medical Imaging Modalities

Muhammad Uzair Khattak; Shahina Kunhimon; Muzammal Naseer; Salman; Khan; and Fahad Shahbaz Khan

arXiv:2412.10372·cs.CV·December 16, 2024

UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities

Muhammad Uzair Khattak, Shahina Kunhimon, Muzammal Naseer, Salman, Khan, and Fahad Shahbaz Khan

PDF

2 Repos 1 Models

TL;DR

UniMed-CLIP introduces a large-scale, multi-modal medical dataset and a unified vision-language model trained on diverse medical imaging modalities, significantly improving zero-shot performance and generalization across medical tasks.

Contribution

The paper presents UniMed, a comprehensive open-source dataset and a unified VLM for multiple medical imaging modalities, enabling scalable pretraining and better cross-modality generalization.

Findings

01

UniMed-CLIP outperforms existing generalist VLMs in medical tasks.

02

Achieves +12.61 absolute gain over BiomedCLIP in zero-shot evaluations.

03

Uses 3x less training data than proprietary models.

Abstract

Vision-Language Models (VLMs) trained via contrastive learning have achieved notable success in natural image tasks. However, their application in the medical domain remains limited due to the scarcity of openly accessible, large-scale medical image-text datasets. Existing medical VLMs either train on closed-source proprietary or relatively small open-source datasets that do not generalize well. Similarly, most models remain specific to a single or limited number of medical imaging domains, again restricting their applicability to other modalities. To address this gap, we introduce UniMed, a large-scale, open-source multi-modal medical dataset comprising over 5.3 million image-text pairs across six diverse imaging modalities: X-ray, CT, MRI, Ultrasound, Pathology, and Fundus. UniMed is developed using a data-collection framework that leverages Large Language Models (LLMs) to transform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
TahaKoleilat/MedCLIPSeg
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning