TL;DR
BioVLM introduces a prompt-learning framework that enhances cross-modality generalization in biomedical vision-language models by dynamically selecting prompts, leveraging LLM priors, and maintaining lightweight training.
Contribution
It proposes a novel prompt bank and dynamic prompt selection method that improves generalization without extensive backbone fine-tuning in biomedical VLMs.
Findings
Achieves state-of-the-art results on 11 MedMNIST+ datasets.
Effectively couples sparse few-shot evidence with rich LLM priors.
Enables transfer to unseen categories and domains.
Abstract
Pretrained biomedical vision-language models (VLMs) such as BioMedCLIP perform well on average but often degrade on challenging modalities where inter-class margins are small and acquisition-specific variations are pronounced, especially under few-shot supervision and when modality priors differ from pretraining corpora substantially. We propose BioVLM, a prompt-learning framework that improves cross-domain generalization without extensive backbone fine-tuning. BioVLM learns a diverse prompt bank and introduces dynamic prompt selection: for each input, it selects the most discriminative prompts via a low-entropy criterion on the predictive distribution, effectively coupling sparse few-shot evidence with rich LLM semantic priors. To strengthen this coupling, we distill high-confidence LLM-derived attributes and enforce robust knowledge transfer through strong/weak augmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
