Test-time Contrastive Concepts for Open-world Semantic Segmentation with Vision-Language Models

Monika Wysocza\'nska; Antonin Vobecky; Amaia Cardiel; Tomasz Trzci\'nski; Renaud Marlet; Andrei Bursuc; Oriane Sim\'eoni

arXiv:2407.05061·cs.CV·June 17, 2025

Test-time Contrastive Concepts for Open-world Semantic Segmentation with Vision-Language Models

Monika Wysocza\'nska, Antonin Vobecky, Amaia Cardiel, Tomasz Trzci\'nski, Renaud Marlet, Andrei Bursuc, Oriane Sim\'eoni

PDF

Open Access

TL;DR

This paper introduces test-time contrastive concepts for open-world semantic segmentation using vision-language models, enabling segmentation of a single concept from a textual prompt without exhaustive concept lists.

Contribution

It proposes two methods to generate test-time contrastive concepts leveraging training set text distribution and LLM prompts, improving segmentation in realistic scenarios.

Findings

01

Effective segmentation of single concepts with minimal prior info

02

Test-time contrastive concepts outperform baseline methods

03

New evaluation metric for single-concept segmentation

Abstract

Recent CLIP-like Vision-Language Models (VLMs), pre-trained on large amounts of image-text pairs to align both modalities with a simple contrastive objective, have paved the way to open-vocabulary semantic segmentation. Given an arbitrary set of textual queries, image pixels are assigned the closest query in feature space. However, this works well when a user exhaustively lists all possible visual concepts in an image that contrast against each other for the assignment. This corresponds to the current evaluation setup in the literature, which relies on having access to a list of in-domain relevant concepts, typically classes of a benchmark dataset. Here, we consider the more challenging (and realistic) scenario of segmenting a single concept, given a textual prompt and nothing else. To achieve good results, besides contrasting with the generic 'background' text, we propose two different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsSparse Evolutionary Training · ALIGN