Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy, Yasaman Parhizkar, Franklin Ogidi, Vahid Reza Khazaie,, Michael Colacci, Ali Etemad, Elham Dolatabadi, Arash Afkanpour

TL;DR
This paper benchmarks various contrastive learning methods for medical multimodal representation, evaluating their transferability, the benefit of multimodal versus unimodal training, and the impact of feature granularity across multiple tasks.
Contribution
It provides a comprehensive comparison of eight contrastive frameworks in the medical domain, revealing insights into their transferability, training strategies, and feature granularity effects.
Findings
General-domain representations transfer well to medical tasks.
Multimodal contrastive training alone is insufficient for optimal performance.
Fine-grained features improve multimodal medical representation effectiveness.
Abstract
We perform a comprehensive benchmarking of contrastive frameworks for learning multimodal representations in the medical domain. Through this study, we aim to answer the following research questions: (i) How transferable are general-domain representations to the medical domain? (ii) Is multimodal contrastive training sufficient, or does it benefit from unimodal training as well? (iii) What is the impact of feature granularity on the effectiveness of multimodal medical representation learning? To answer these questions, we investigate eight contrastive learning approaches under identical training setups, and train them on 2.8 million image-text pairs from four datasets, and evaluate them on 25 downstream tasks, including classification (zero-shot and linear probing), image-to-text and text-to-image retrieval, and visual question-answering. Our findings suggest a positive answer to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsContrastive Learning
