Ultra-Light Test-Time Adaptation for Vision--Language Models

Byunghyun Kim

arXiv:2511.09101·cs.CV·November 13, 2025

Ultra-Light Test-Time Adaptation for Vision--Language Models

Byunghyun Kim

PDF

Open Access

TL;DR

UL-TTA is a training-free, lightweight test-time adaptation method for vision-language models that improves accuracy and calibration under domain shift by adapting only logit-level parameters with Bayesian updates.

Contribution

It introduces UL-TTA, a fully training-free, backprop-free framework that adapts only logit-level parameters using Bayesian updates, suitable for streaming and edge scenarios.

Findings

01

Consistently improves accuracy on large-scale benchmarks.

02

Reduces calibration error significantly.

03

Operates with less than 8% latency overhead.

Abstract

Vision-Language Models (VLMs) such as CLIP achieve strong zero-shot recognition by comparing image embeddings to text-derived class prototypes. However, under domain shift, they suffer from feature drift, class-prior mismatch, and severe miscalibration. Existing test-time adaptation (TTA) methods often require backpropagation through large backbones, covariance estimation, or heavy memory/state, which is problematic for streaming and edge scenarios. We propose Ultra-Light Test-Time Adaptation (UL-TTA), a fully training-free and backprop-free framework that freezes the backbone and adapts only logit-level parameters: class prototypes, class priors, and temperature. UL-TTA performs an online EM-style procedure with (i) selective sample filtering to use only confident predictions, (ii) closed-form Bayesian updates for prototypes and priors anchored by text and Dirichlet priors, (iii)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications