Realistic Test-Time Adaptation of Vision-Language Models

Maxime Zanella; Cl\'ement Fuchs; Christophe De Vleeschouwer; Ismail; Ben Ayed

arXiv:2501.03729·cs.CV·January 8, 2025

Realistic Test-Time Adaptation of Vision-Language Models

Maxime Zanella, Cl\'ement Fuchs, Christophe De Vleeschouwer, Ismail, Ben Ayed

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the limitations of current test-time adaptation methods for vision-language models under realistic scenarios and introduces StatA, a new method that maintains robustness across variable class distributions during deployment.

Contribution

The paper proposes a realistic evaluation framework for TTA of VLMs and introduces StatA, a versatile adaptation method with a novel regularization to preserve initial knowledge.

Findings

01

Current TTA methods compromise zero-shot robustness in realistic scenarios.

02

StatA effectively handles variable class distributions during test time.

03

StatA maintains model robustness across diverse deployment conditions.

Abstract

The zero-shot capabilities of Vision-Language Models (VLMs) have been widely leveraged to improve predictive performance. However, previous works on transductive or test-time adaptation (TTA) often make strong assumptions about the data distribution, such as the presence of all classes. Our work challenges these favorable deployment scenarios, and introduces a more realistic evaluation framework, including: (i) a variable number of effective classes for adaptation within a single batch, and (ii) non-i.i.d. batches of test samples in online adaptation settings. We provide comprehensive evaluations, comparisons, and ablation studies that demonstrate how current transductive or TTA methods for VLMs systematically compromise the models' initial zero-shot robustness across various realistic scenarios, favoring performance gains under advantageous assumptions about the test samples'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maxzanella/stata
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications