Semantic Anchor Transport: Robust Test-Time Adaptation for Vision-Language Models
Shambhavi Mishra, Julio Silva-Rodriguez, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz

TL;DR
Semantic Anchor Transport (SAT) improves the robustness of vision-language models like CLIP during inference by aligning visual embeddings with text-based semantic anchors, effectively handling distributional shifts through test-time adaptation.
Contribution
The paper introduces SAT, a novel test-time adaptation method that uses optimal transport for pseudo-labeling and multi-template distillation, enhancing model robustness without extra computational cost.
Findings
SAT outperforms recent state-of-the-art methods on multiple benchmarks.
It achieves consistent performance gains across diverse test scenarios.
SAT is computationally efficient and suitable for real-world applications.
Abstract
Large pre-trained vision-language models (VLMs), such as CLIP, have shown unprecedented zero-shot performance across a wide range of tasks. Nevertheless, these models may be unreliable under distributional shifts, as their performance is significantly degraded. In this work, we investigate how to efficiently utilize class text information to mitigate distribution drifts encountered by VLMs during inference. In particular, we propose generating pseudo-labels for the noisy test-time samples by aligning visual embeddings with reliable, text-based semantic anchors. Specifically, to maintain the regular structure of the dataset properly, we formulate the problem as a batch-wise label assignment, which is efficiently solved using Optimal Transport. Our method, Semantic Anchor Transport (SAT), utilizes such pseudo-labels as supervisory signals for test-time adaptation, yielding a principled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Digital Accessibility for Disabilities · Natural Language Processing Techniques
MethodsContrastive Learning · Knowledge Distillation · Contrastive Language-Image Pre-training
