Benchmarking Vision-Language Contrastive Methods for Medical   Representation Learning

Shuvendu Roy; Yasaman Parhizkar; Franklin Ogidi; Vahid Reza Khazaie,; Michael Colacci; Ali Etemad; Elham Dolatabadi; Arash Afkanpour

arXiv:2406.07450·cs.CV·June 12, 2024

Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

Shuvendu Roy, Yasaman Parhizkar, Franklin Ogidi, Vahid Reza Khazaie,, Michael Colacci, Ali Etemad, Elham Dolatabadi, Arash Afkanpour

PDF

Open Access 1 Repo

TL;DR

This paper benchmarks various contrastive learning methods for medical multimodal representation, evaluating their transferability, the benefit of multimodal versus unimodal training, and the impact of feature granularity across multiple tasks.

Contribution

It provides a comprehensive comparison of eight contrastive frameworks in the medical domain, revealing insights into their transferability, training strategies, and feature granularity effects.

Findings

01

General-domain representations transfer well to medical tasks.

02

Multimodal contrastive training alone is insufficient for optimal performance.

03

Fine-grained features improve multimodal medical representation effectiveness.

Abstract

We perform a comprehensive benchmarking of contrastive frameworks for learning multimodal representations in the medical domain. Through this study, we aim to answer the following research questions: (i) How transferable are general-domain representations to the medical domain? (ii) Is multimodal contrastive training sufficient, or does it benefit from unimodal training as well? (iii) What is the impact of feature granularity on the effectiveness of multimodal medical representation learning? To answer these questions, we investigate eight contrastive learning approaches under identical training setups, and train them on 2.8 million image-text pairs from four datasets, and evaluate them on 25 downstream tasks, including classification (zero-shot and linear probing), image-to-text and text-to-image retrieval, and visual question-answering. Our findings suggest a positive answer to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuvenduroy/multimodal
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsContrastive Learning