Semantic Anchor Transport: Robust Test-Time Adaptation for Vision-Language Models

Shambhavi Mishra; Julio Silva-Rodriguez; Ismail Ben Ayed; Marco Pedersoli; Jose Dolz

arXiv:2411.17002·cs.CV·January 5, 2026

Semantic Anchor Transport: Robust Test-Time Adaptation for Vision-Language Models

Shambhavi Mishra, Julio Silva-Rodriguez, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz

PDF

Open Access 1 Repo

TL;DR

Semantic Anchor Transport (SAT) improves the robustness of vision-language models like CLIP during inference by aligning visual embeddings with text-based semantic anchors, effectively handling distributional shifts through test-time adaptation.

Contribution

The paper introduces SAT, a novel test-time adaptation method that uses optimal transport for pseudo-labeling and multi-template distillation, enhancing model robustness without extra computational cost.

Findings

01

SAT outperforms recent state-of-the-art methods on multiple benchmarks.

02

It achieves consistent performance gains across diverse test scenarios.

03

SAT is computationally efficient and suitable for real-world applications.

Abstract

Large pre-trained vision-language models (VLMs), such as CLIP, have shown unprecedented zero-shot performance across a wide range of tasks. Nevertheless, these models may be unreliable under distributional shifts, as their performance is significantly degraded. In this work, we investigate how to efficiently utilize class text information to mitigate distribution drifts encountered by VLMs during inference. In particular, we propose generating pseudo-labels for the noisy test-time samples by aligning visual embeddings with reliable, text-based semantic anchors. Specifically, to maintain the regular structure of the dataset properly, we formulate the problem as a batch-wise label assignment, which is efficiently solved using Optimal Transport. Our method, Semantic Anchor Transport (SAT), utilizes such pseudo-labels as supervisory signals for test-time adaptation, yielding a principled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShambhaviCodes/CLIPOT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Digital Accessibility for Disabilities · Natural Language Processing Techniques

MethodsContrastive Learning · Knowledge Distillation · Contrastive Language-Image Pre-training