A Lost Opportunity for Vision-Language Models: A Comparative Study of   Online Test-Time Adaptation for Vision-Language Models

Mario D\"obler; Robert A. Marsden; Tobias Raichle; Bin Yang

arXiv:2405.14977·cs.CV·September 10, 2024

A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-Time Adaptation for Vision-Language Models

Mario D\"obler, Robert A. Marsden, Tobias Raichle, Bin Yang

PDF

Open Access 1 Repo

TL;DR

This paper systematically evaluates online test-time adaptation techniques for vision-language models like CLIP, highlighting their potential and limitations in improving robustness under distribution shifts through various prompt and ensemble strategies.

Contribution

It provides a comprehensive comparison of prompt-based and test-time adaptation methods for vision-language models, introducing a vision-text ensemble approach to enhance robustness.

Findings

01

Test-time adaptation improves model robustness under distribution shifts.

02

Ensemble strategies outperform single prompt methods.

03

Existing adaptation methods have limitations in real-world scenarios.

Abstract

In deep learning, maintaining model robustness against distribution shifts is critical. This work explores a broad range of possibilities to adapt vision-language foundation models at test-time, with a particular emphasis on CLIP and its variants. The study systematically examines prompt-based techniques and existing test-time adaptation methods, aiming to improve the robustness under distribution shift in diverse real-world scenarios. Specifically, the investigation covers various prompt engineering strategies, including handcrafted prompts, prompt ensembles, and prompt learning techniques. Additionally, we introduce a vision-text-space ensemble that substantially enhances average performance compared to text-space-only ensembles. Since online test-time adaptation has shown to be effective to mitigate performance drops under distribution shift, the study extends its scope to evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mariodoebler/test-time-adaptation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsFocus · Contrastive Language-Image Pre-training