Frustratingly Easy Test-Time Adaptation of Vision-Language Models

Matteo Farina; Gianni Franchi; Giovanni Iacca; Massimiliano Mancini,; Elisa Ricci

arXiv:2405.18330·cs.CV·November 5, 2024·1 cites

Frustratingly Easy Test-Time Adaptation of Vision-Language Models

Matteo Farina, Gianni Franchi, Giovanni Iacca, Massimiliano Mancini,, Elisa Ricci

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ZERO, a simple, fast, and memory-efficient test-time adaptation method for vision-language models that significantly improves performance without backpropagation.

Contribution

The authors reveal a hidden, effective TTA method within prompt tuning and propose ZERO, which requires only a single forward pass and no backpropagation, outperforming existing methods.

Findings

01

ZERO surpasses state-of-the-art TTA methods in accuracy.

02

ZERO is nearly 10x faster and 13x more memory-efficient.

03

ZERO is a strong, simple baseline for future TTA research.

Abstract

Vision-Language Models seamlessly discriminate among arbitrary semantic categories, yet they still suffer from poor generalization when presented with challenging examples. For this reason, Episodic Test-Time Adaptation (TTA) strategies have recently emerged as powerful techniques to adapt VLMs in the presence of a single unlabeled image. The recent literature on TTA is dominated by the paradigm of prompt tuning by Marginal Entropy Minimization, which, relying on online backpropagation, inevitably slows down inference while increasing memory. In this work, we theoretically investigate the properties of this approach and unveil that a surprisingly strong TTA method lies dormant and hidden within it. We term this approach ZERO (TTA with "zero" temperature), whose design is both incredibly effective and frustratingly simple: augment N times, predict, retain the most confident predictions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

farinamatteo/zero
pytorchOfficial

Videos

Frustratingly Easy Test-Time Adaptation of Vision-Language Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsSoftmax