Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Tilemachos Aravanis; Vladan Stojni\'c; Bill Psomas; Nikos Komodakis; Giorgos Tolias

arXiv:2602.23339·cs.CV·February 27, 2026

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Tilemachos Aravanis, Vladan Stojni\'c, Bill Psomas, Nikos Komodakis, Giorgos Tolias

PDF

Open Access

TL;DR

This paper introduces a few-shot open-vocabulary segmentation method that uses support sets of annotated images and a retrieval-augmented adapter to improve segmentation accuracy, bridging the gap with supervised methods.

Contribution

It proposes a retrieval-augmented test-time adapter with learned fusion for few-shot open-vocabulary segmentation, supporting continual support set expansion and fine-grained tasks.

Findings

01

Significantly narrows the gap between zero-shot and supervised segmentation.

02

Supports continually expanding support sets and personalized segmentation.

03

Achieves stronger modality synergy through learned, per-query fusion.

Abstract

Open-vocabulary segmentation (OVS) extends the zero-shot recognition capabilities of vision-language models (VLMs) to pixel-level prediction, enabling segmentation of arbitrary categories specified by text prompts. Despite recent progress, OVS lags behind fully supervised approaches due to two challenges: the coarse image-level supervision used to train VLMs and the semantic ambiguity of natural language. We address these limitations by introducing a few-shot setting that augments textual prompts with a support set of pixel-annotated images. Building on this, we propose a retrieval-augmented test-time adapter that learns a lightweight, per-image classifier by fusing textual and visual support features. Unlike prior methods relying on late, hand-crafted fusion, our approach performs learned, per-query fusion, achieving stronger synergy between modalities. The method supports continually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI