Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models
Xinyu Chen, Haotian Zhai, Can Zhang, Xiupeng Shi, Ruirui Li

TL;DR
This paper introduces MCP, a multi-cache prototype learning framework for test-time adaptation of vision-language models, which improves zero-shot generalization by leveraging different cache types to enhance intra-class compactness and calibration.
Contribution
It proposes a novel multi-cache prototype-based TTA method with three caches and introduces MCP++, incorporating cross-modal alignment and residual learning for better performance.
Findings
Achieves state-of-the-art results on 15 downstream tasks.
Demonstrates the effectiveness of multi-cache strategy over single-cache methods.
Shows significant improvement in intra-class compactness and prediction calibration.
Abstract
In zero-shot setting, test-time adaptation adjusts pre-trained models using unlabeled data from the test phase to enhance performance on unknown test distributions. Existing cache-enhanced TTA methods rely on a low-entropy criterion to select samples for prototype construction, assuming intra-class compactness. However, low-entropy samples may be unreliable under distribution shifts, and the resulting prototypes may not ensure compact intra-class distributions. This study identifies a positive correlation between cache-enhanced performance and intra-class compactness. Based on this observation, we propose a Multi-Cache enhanced Prototype-based Test-Time Adaptation (MCP) featuring three caches: an entropy cache for initializing prototype representations with low-entropy samples, an align cache for integrating visual and textual information to achieve compact intra-class distributions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
