Test-time Contrastive Concepts for Open-world Semantic Segmentation with Vision-Language Models
Monika Wysocza\'nska, Antonin Vobecky, Amaia Cardiel, Tomasz Trzci\'nski, Renaud Marlet, Andrei Bursuc, Oriane Sim\'eoni

TL;DR
This paper introduces test-time contrastive concepts for open-world semantic segmentation using vision-language models, enabling segmentation of a single concept from a textual prompt without exhaustive concept lists.
Contribution
It proposes two methods to generate test-time contrastive concepts leveraging training set text distribution and LLM prompts, improving segmentation in realistic scenarios.
Findings
Effective segmentation of single concepts with minimal prior info
Test-time contrastive concepts outperform baseline methods
New evaluation metric for single-concept segmentation
Abstract
Recent CLIP-like Vision-Language Models (VLMs), pre-trained on large amounts of image-text pairs to align both modalities with a simple contrastive objective, have paved the way to open-vocabulary semantic segmentation. Given an arbitrary set of textual queries, image pixels are assigned the closest query in feature space. However, this works well when a user exhaustively lists all possible visual concepts in an image that contrast against each other for the assignment. This corresponds to the current evaluation setup in the literature, which relies on having access to a list of in-domain relevant concepts, typically classes of a benchmark dataset. Here, we consider the more challenging (and realistic) scenario of segmenting a single concept, given a textual prompt and nothing else. To achieve good results, besides contrasting with the generic 'background' text, we propose two different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsSparse Evolutionary Training · ALIGN
