DOTA: Distributional Test-Time Adaptation of Vision-Language Models

Zongbo Han; Jialong Yang; Guangyu Wang; Junfan Li; Qianli Xu; Mike Zheng Shou; Changqing Zhang

arXiv:2409.19375·cs.LG·September 29, 2025

DOTA: Distributional Test-Time Adaptation of Vision-Language Models

Zongbo Han, Jialong Yang, Guangyu Wang, Junfan Li, Qianli Xu, Mike Zheng Shou, Changqing Zhang

PDF

Open Access 1 Video

TL;DR

DOTA introduces a distribution-based test-time adaptation method for vision-language models that continuously estimates test data distributions to improve robustness and reduce catastrophic forgetting during deployment.

Contribution

It proposes a novel distribution-centric approach for test-time adaptation, addressing cache management limitations and enhancing model robustness in deployment.

Findings

01

DOTA outperforms existing test-time adaptation methods.

02

It significantly reduces catastrophic forgetting.

03

Achieves state-of-the-art performance on benchmark datasets.

Abstract

Vision-language foundation models (VLMs), such as CLIP, exhibit remarkable performance across a wide range of tasks. However, deploying these models can be unreliable when significant distribution gaps exist between training and test data, while fine-tuning for diverse scenarios is often costly. Cache-based test-time adapters offer an efficient alternative by storing representative test samples to guide subsequent classifications. Yet, these methods typically employ naive cache management with limited capacity, leading to severe catastrophic forgetting when samples are inevitably dropped during updates. In this paper, we propose DOTA (DistributiOnal Test-time Adaptation), a simple yet effective method addressing this limitation. Crucially, instead of merely memorizing individual test samples, DOTA continuously estimates the underlying distribution of the test data stream. Test-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DOTA: Distributional Test-time Adaptation of Vision-Language Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsAdapter · Contrastive Language-Image Pre-training