Semi-Supervised Few-Shot Adaptation of Vision-Language Models
Julio Silva-Rodr\'iguez, Ender Konukoglu

TL;DR
This paper introduces a semi-supervised approach for adapting vision-language models to medical imaging tasks with limited labeled data, significantly reducing annotation effort and improving performance in low-shot scenarios.
Contribution
It proposes an efficient semi-supervised solver that propagates text-informed pseudo-labels, enhancing few-shot adaptation of VLMs in medical imaging.
Findings
Reduces labeling effort by over 50% in low-shot regimes
Improves classification performance on medical imaging tasks
Effectively propagates pseudo-labels using unlabeled data
Abstract
Vision-language models (VLMs) pre-trained on large, heterogeneous data sources are becoming increasingly popular, providing rich multi-modal embeddings that enable efficient transfer to new tasks. A particularly relevant application is few-shot adaptation, where only a handful of annotated examples are available to adapt the model through multi-modal linear probes. In medical imaging, specialized VLMs have shown promising performance in zero- and few-shot image classification, which is valuable for mitigating the high cost of expert annotations. However, challenges remain in extremely low-shot regimes: the inherent class imbalances in medical tasks often lead to underrepresented categories, penalizing overall model performance. To address this limitation, we propose leveraging unlabeled data by introducing an efficient semi-supervised solver that propagates text-informed pseudo-labels…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
