Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study
Ziyuan Qin, Huahui Yi, Qicheng Lao, Kang Li

TL;DR
This study explores how pre-trained vision language models can be effectively transferred to medical imaging, demonstrating that well-designed prompts and automatic prompt generation significantly enhance zero-shot and fine-tuned performance across diverse medical datasets.
Contribution
The paper introduces methods for automatic medical prompt generation and demonstrates their effectiveness in improving VLM performance in medical image understanding.
Findings
Well-designed prompts improve zero-shot accuracy.
Automatic prompt generation injects expert medical knowledge.
Fine-tuned models outperform supervised counterparts.
Abstract
The large-scale pre-trained vision language models (VLM) have shown remarkable domain transfer capability on natural images. However, it remains unknown whether this capability can also apply to the medical image domain. This paper thoroughly studies the knowledge transferability of pre-trained VLMs to the medical domain, where we show that well-designed medical prompts are the key to elicit knowledge from pre-trained VLMs. We demonstrate that by prompting with expressive attributes that are shared between domains, the VLM can carry the knowledge across domains and improve its generalization. This mechanism empowers VLMs to recognize novel objects with fewer or without image samples. Furthermore, to avoid the laborious manual designing process, we develop three approaches for automatic generation of medical prompts, which can inject expert-level medical knowledge and image-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
