Increasing Textual Context Size Boosts Medical Image-Text Matching

Idan Glassberg; Tom Hope

arXiv:2303.13340·cs.LG·March 24, 2023·1 cites

Increasing Textual Context Size Boosts Medical Image-Text Matching

Idan Glassberg, Tom Hope

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces ClipMD, a simple method that enhances medical image-text matching by increasing textual context size, leading to state-of-the-art results in medical domains.

Contribution

The paper presents ClipMD, a novel approach that uses a sliding window technique to encode longer textual contexts for improved medical image-text matching.

Findings

01

ClipMD outperforms existing models on two medical datasets.

02

Increasing textual context size improves matching performance.

03

The approach is simple yet effective for medical image-text tasks.

Abstract

This short technical report demonstrates a simple technique that yields state of the art results in medical image-text matching tasks. We analyze the use of OpenAI's CLIP, a general image-text matching model, and observe that CLIP's limited textual input size has negative impact on downstream performance in the medical domain where encoding longer textual contexts is often required. We thus train and release ClipMD, which is trained with a simple sliding window technique to encode textual captions. ClipMD was tested on two medical image-text datasets and compared with other image-text matching models. The results show that ClipMD outperforms other models on both datasets by a large margin. We make our code and pretrained model publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/Idan0405/ClipMD
noneOfficial

Models

🤗
Idan0405/ClipMD
model· 9 dl· ♡ 8
9 dl♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling

MethodsContrastive Language-Image Pre-training