Aligning Medical Images with General Knowledge from Large Language Models
Xiao Fang, Yi Lin, Dong Zhang, Kwang-Ting Cheng, Hao Chen

TL;DR
This paper introduces ViP, a framework that leverages large vision-language models like CLIP to improve medical image analysis by extracting visual symptoms and guiding prompt learning.
Contribution
The paper proposes a novel visual symptom-guided prompt learning framework that transfers knowledge from large language models to medical imaging tasks.
Findings
ViP outperforms state-of-the-art methods on two datasets.
The framework effectively extracts visual symptoms from language models.
ViP demonstrates strong generalization in medical image analysis.
Abstract
Pre-trained large vision-language models (VLMs) like CLIP have revolutionized visual representation learning using natural language as supervisions, and demonstrated promising generalization ability. In this work, we propose ViP, a novel visual symptom-guided prompt learning framework for medical image analysis, which facilitates general knowledge transfer from CLIP. ViP consists of two key components: a visual symptom generator (VSG) and a dual-prompt network. Specifically, VSG aims to extract explicable visual symptoms from pre-trained large language models, while the dual-prompt network utilizes these visual symptoms to guide the training on two learnable prompt modules, i.e., context prompt and merge prompt, which effectively adapts our framework to medical image analysis via large VLMs. Extensive experimental results demonstrate that ViP can outperform state-of-the-art methods on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Image Retrieval and Classification Techniques · Radiomics and Machine Learning in Medical Imaging
MethodsContrastive Language-Image Pre-training
