ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates

Hamidreza Dastmalchi; Aijun An; Ali cheraghian

arXiv:2508.05898·cs.CV·August 11, 2025

ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates

Hamidreza Dastmalchi, Aijun An, Ali cheraghian

PDF

Open Access

TL;DR

ETTA introduces a dynamic, recursive embedding update mechanism for vision-language models that enhances test-time adaptation efficiency and accuracy by integrating all incoming test data and reducing prompt dependency.

Contribution

The paper proposes ETTA, a novel test-time adaptation method with recursive embedding updates and adaptive ensemble, improving upon cache-based approaches for vision-language models.

Findings

01

ETTA outperforms state-of-the-art TTA models in accuracy.

02

ETTA reduces computational complexity and memory usage.

03

ETTA effectively adapts to distribution shifts in benchmark tests.

Abstract

Pretrained vision-language models (VLMs) like CLIP show strong zero-shot performance but struggle with generalization under distribution shifts. Test-Time Adaptation (TTA) addresses this by adapting VLMs to unlabeled test data in new domains. While some TTA methods rely on prompt-tuning, training-free cache-based approaches are preferred for efficiency. However, current cache-based TTA models store only a limited set of high-confidence samples, restricting the decision boundary to these samples and ignoring the influence of other incoming test data. To address this, we propose Efficient Test-Time Adaptation (ETTA), introducing a Recursive Updating module that integrates all incoming test samples, progressively refining the decision boundary. This strategy mimics an unbounded cache, dynamically updating contextual embeddings for improved accuracy with minimal memory and computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling