Medical Knowledge Intervention Prompt Tuning for Medical Image Classification
Ye Du, Nanxi Yu, Shujun Wang

TL;DR
This paper introduces CILMP, a novel prompt tuning method that leverages large language models to incorporate disease-specific medical knowledge into vision-language models, significantly improving medical image classification performance.
Contribution
CILMP is the first approach to integrate LLM-derived medical knowledge into prompt tuning for VLMs, enabling instance-adaptive prompts for better medical image classification.
Findings
CILMP outperforms existing prompt tuning methods across multiple datasets.
The method effectively captures disease-specific features in prompts.
Conditional intervention improves prompt adaptability to individual images.
Abstract
Vision-language foundation models (VLMs) have shown great potential in feature transfer and generalization across a wide spectrum of medical-related downstream tasks. However, fine-tuning these models is resource-intensive due to their large number of parameters. Prompt tuning has emerged as a viable solution to mitigate memory usage and reduce training time while maintaining competitive performance. Nevertheless, the challenge is that existing prompt tuning methods cannot precisely distinguish different kinds of medical concepts, which miss essentially specific disease-related features across various medical imaging modalities in medical image classification tasks. We find that Large Language Models (LLMs), trained on extensive text corpora, are particularly adept at providing this specialized medical knowledge. Motivated by this, we propose incorporating LLMs into the prompt tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
