Prototype-Based Test-Time Adaptation of Vision-Language Models

Zhaohong Huang; Yuxin Zhang; Wenjing Liu; Fei Chao; Rongrong Ji

arXiv:2604.21360·cs.CV·May 14, 2026

Prototype-Based Test-Time Adaptation of Vision-Language Models

Zhaohong Huang, Yuxin Zhang, Wenjing Liu, Fei Chao, Rongrong Ji

PDF

TL;DR

This paper introduces Prototype-Based Test-Time Adaptation (PTA), a highly efficient method for vision-language models that improves accuracy and speed by using class-specific prototypes without cache overhead.

Contribution

PTA is a novel TTA approach that adaptively updates class prototypes based on test samples, eliminating cache-related inefficiencies and achieving state-of-the-art results.

Findings

01

PTA improves CLIP accuracy from 65.64% to 69.38% on 10 benchmarks.

02

PTA retains 92% of CLIP's inference speed on large-scale datasets.

03

PTA outperforms cache-based TTA methods in both accuracy and efficiency.

Abstract

Test-time adaptation (TTA) has emerged as a promising paradigm for vision-language models (VLMs) to bridge the distribution gap between pre-training and test data. Recent works have focused on backpropagation-free TTA methods that rely on cache-based designs, but these introduce two key limitations. First, inference latency increases as the cache grows with the number of classes, leading to inefficiencies in large-scale settings. Second, suboptimal performance occurs when the cache contains insufficient or incorrect samples. In this paper, we present Prototype-Based Test-Time Adaptation (PTA), an efficient and effective TTA paradigm that uses a set of class-specific knowledge prototypes to accumulate knowledge from test samples. Particularly, knowledge prototypes are adaptively weighted based on the zero-shot class confidence of each test sample, incorporating the sample's visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.