Vision-Language Enhanced Foundation Model for Semi-supervised Medical Image Segmentation
Jiaqi Guo, Mingzhen Li, Hanyu Su, Santiago L\'opez, Lexiaozi Fan, Daniel Kim, and Aggelos Katsaggelos

TL;DR
This paper introduces VESSA, a vision-language enhanced foundation model that improves semi-supervised medical image segmentation by leveraging visual-semantic understanding and iterative pseudo-label refinement, significantly boosting accuracy with limited annotations.
Contribution
The work presents a novel VLM-based segmentation foundation model integrated into SSL, enabling effective semantic feature matching and iterative pseudo-label refinement for medical image segmentation.
Findings
VESSA outperforms existing methods on multiple datasets.
Significant accuracy improvements under limited annotations.
Effective integration of vision-language models into SSL frameworks.
Abstract
Semi-supervised learning (SSL) has emerged as an effective paradigm for medical image segmentation, reducing the reliance on extensive expert annotations. Meanwhile, vision-language models (VLMs) have demonstrated strong generalization and few-shot capabilities across diverse visual domains. In this work, we integrate VLM-based segmentation into semi-supervised medical image segmentation by introducing a Vision-Language Enhanced Semi-supervised Segmentation Assistant (VESSA) that incorporates foundation-level visual-semantic understanding into SSL frameworks. Our approach consists of two stages. In Stage 1, the VLM-enhanced segmentation foundation model VESSA is trained as a reference-guided segmentation assistant using a template bank containing gold-standard exemplars, simulating learning from limited labeled data. Given an input-template pair, VESSA performs visual feature matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
