The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

Lijun Sheng; Jian Liang; Ran He; Zilei Wang; Tieniu Tan

arXiv:2506.24000·cs.LG·October 14, 2025

The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

Lijun Sheng, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

PDF

Open Access

TL;DR

This paper introduces TTA-VLM, a comprehensive benchmark for evaluating test-time adaptation methods on vision-language models, revealing limited gains and trade-offs in trustworthiness, to foster more reliable future strategies.

Contribution

The paper presents TTA-VLM, a unified benchmark with diverse evaluation metrics, extending analysis beyond CLIP to SigLIP and training-time tuning methods for fair comparison.

Findings

01

Existing TTA methods show limited improvements.

02

Poor synergy between TTA and training-time fine-tuning.

03

Accuracy gains often reduce model trustworthiness.

Abstract

Test-time adaptation (TTA) methods have gained significant attention for enhancing the performance of vision-language models (VLMs) such as CLIP during inference, without requiring additional labeled data. However, current TTA researches generally suffer from major limitations such as duplication of baseline results, limited evaluation metrics, inconsistent experimental settings, and insufficient analysis. These problems hinder fair comparisons between TTA methods and make it difficult to assess their practical strengths and weaknesses. To address these challenges, we introduce TTA-VLM, a comprehensive benchmark for evaluating TTA methods on VLMs. Our benchmark implements 8 episodic TTA and 7 online TTA methods within a unified and reproducible framework, and evaluates them across 15 widely used datasets. Unlike prior studies focused solely on CLIP, we extend the evaluation to SigLIP--a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis