Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation

Mehrdad Noori; David Osowiechi; Gustavo Adolfo Vargas Hakim; Ali Bahri; Moslem Yazdanpanah; Sahar Dastani; Farzad Beizaee; Ismail Ben Ayed; Christian Desrosiers

arXiv:2505.21844·cs.CV·November 11, 2025

Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation

Mehrdad Noori, David Osowiechi, Gustavo Adolfo Vargas Hakim, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel test-time adaptation method for open-vocabulary semantic segmentation using vision-language models, improving performance without additional training data across diverse datasets and conditions.

Contribution

It proposes a multi-level, multi-prompt entropy minimization approach tailored for segmentation, and establishes a comprehensive benchmark suite for evaluating TTA in open-vocabulary segmentation.

Findings

01

Our method outperforms TTA classification baselines across multiple datasets.

02

It remains effective with a single test sample and no extra training.

03

The benchmark suite includes 87 diverse test scenarios.

Abstract

Recently, test-time adaptation has attracted wide interest in the context of vision-language models for image classification. However, to the best of our knowledge, the problem is completely overlooked in dense prediction tasks such as Open-Vocabulary Semantic Segmentation (OVSS). In response, we propose a novel TTA method tailored to adapting VLMs for segmentation during test time. Unlike TTA methods for image classification, our Multi-Level and Multi-Prompt (MLMP) entropy minimization integrates features from intermediate vision-encoder layers and is performed with different text-prompt templates at both the global CLS token and local pixel-wise levels. Our approach could be used as plug-and-play for any segmentation network, does not require additional training data or labels, and remains effective even with a single test sample. Furthermore, we introduce a comprehensive OVSS TTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dosowiechi/mlmp
pytorchOfficial

Videos

Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications