Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?
Tilemachos Aravanis, Vladan Stojni\'c, Bill Psomas, Nikos Komodakis, Giorgos Tolias

TL;DR
This paper introduces a few-shot open-vocabulary segmentation method that uses support sets of annotated images and a retrieval-augmented adapter to improve segmentation accuracy, bridging the gap with supervised methods.
Contribution
It proposes a retrieval-augmented test-time adapter with learned fusion for few-shot open-vocabulary segmentation, supporting continual support set expansion and fine-grained tasks.
Findings
Significantly narrows the gap between zero-shot and supervised segmentation.
Supports continually expanding support sets and personalized segmentation.
Achieves stronger modality synergy through learned, per-query fusion.
Abstract
Open-vocabulary segmentation (OVS) extends the zero-shot recognition capabilities of vision-language models (VLMs) to pixel-level prediction, enabling segmentation of arbitrary categories specified by text prompts. Despite recent progress, OVS lags behind fully supervised approaches due to two challenges: the coarse image-level supervision used to train VLMs and the semantic ambiguity of natural language. We address these limitations by introducing a few-shot setting that augments textual prompts with a support set of pixel-annotated images. Building on this, we propose a retrieval-augmented test-time adapter that learns a lightweight, per-image classifier by fusing textual and visual support features. Unlike prior methods relying on late, hand-crafted fusion, our approach performs learned, per-query fusion, achieving stronger synergy between modalities. The method supports continually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
