Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models

Xinyu Chen; Haotian Zhai; Can Zhang; Xiupeng Shi; Ruirui Li

arXiv:2508.01225·cs.CV·August 25, 2025

Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models

Xinyu Chen, Haotian Zhai, Can Zhang, Xiupeng Shi, Ruirui Li

PDF

Open Access

TL;DR

This paper introduces MCP, a multi-cache prototype learning framework for test-time adaptation of vision-language models, which improves zero-shot generalization by leveraging different cache types to enhance intra-class compactness and calibration.

Contribution

It proposes a novel multi-cache prototype-based TTA method with three caches and introduces MCP++, incorporating cross-modal alignment and residual learning for better performance.

Findings

01

Achieves state-of-the-art results on 15 downstream tasks.

02

Demonstrates the effectiveness of multi-cache strategy over single-cache methods.

03

Shows significant improvement in intra-class compactness and prediction calibration.

Abstract

In zero-shot setting, test-time adaptation adjusts pre-trained models using unlabeled data from the test phase to enhance performance on unknown test distributions. Existing cache-enhanced TTA methods rely on a low-entropy criterion to select samples for prototype construction, assuming intra-class compactness. However, low-entropy samples may be unreliable under distribution shifts, and the resulting prototypes may not ensure compact intra-class distributions. This study identifies a positive correlation between cache-enhanced performance and intra-class compactness. Based on this observation, we propose a Multi-Cache enhanced Prototype-based Test-Time Adaptation (MCP) featuring three caches: an entropy cache for initializing prototype representations with low-entropy samples, an align cache for integrating visual and textual information to achieve compact intra-class distributions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis