Adaptive Cache Enhancement for Test-Time Adaptation of Vision-Language Models

Khanh-Binh Nguyen; Phuoc-Nguyen Bui; Hyunseung Choo; Duc Thanh Nguyen

arXiv:2508.07570·cs.CV·November 17, 2025

Adaptive Cache Enhancement for Test-Time Adaptation of Vision-Language Models

Khanh-Binh Nguyen, Phuoc-Nguyen Bui, Hyunseung Choo, Duc Thanh Nguyen

PDF

Open Access

TL;DR

This paper introduces ACE, a novel cache-based test-time adaptation framework for vision-language models that dynamically constructs class-specific caches to improve robustness and accuracy under distribution shifts.

Contribution

ACE employs class-wise thresholds and iterative refinement to build a robust cache, enabling adaptive decision boundaries and enhanced out-of-distribution performance.

Findings

01

Achieves state-of-the-art results on 15 benchmarks.

02

Demonstrates superior robustness under distribution shifts.

03

Outperforms existing TTA methods in diverse scenarios.

Abstract

Vision-language models (VLMs) exhibit remarkable zero-shot generalization but suffer performance degradation under distribution shifts in downstream tasks, particularly in the absence of labeled data. Test-Time Adaptation (TTA) addresses this challenge by enabling online optimization of VLMs during inference, eliminating the need for annotated data. Cache-based TTA methods exploit historical knowledge by maintaining a dynamic memory cache of low-entropy or high-confidence samples, promoting efficient adaptation to out-of-distribution data. Nevertheless, these methods face two critical challenges: (1) unreliable confidence metrics under significant distribution shifts, resulting in error accumulation within the cache and degraded adaptation performance; and (2) rigid decision boundaries that fail to accommodate substantial distributional variations, leading to suboptimal predictions. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis