Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption
Kazuki Adachi, Shin'ya Yamaguchi, Tomoki Hamagami

TL;DR
This paper introduces UnInfo, a test-time adaptation method for vision-language models like CLIP, which enhances robustness against sensor degradation by maintaining embedding uniformity and information balance during adaptation.
Contribution
The paper proposes a novel uniformity-aware test-time adaptation method called UnInfo that specifically addresses sensor degradation in vision-language models, a challenge not tackled by existing methods.
Findings
UnInfo improves accuracy on sensor-degraded images.
Maintains embedding uniformity and information balance during adaptation.
Outperforms existing TTA methods under sensor degradation conditions.
Abstract
Pre-trained vision-language models such as contrastive language-image pre-training (CLIP) have demonstrated a remarkable generalizability, which has enabled a wide range of applications represented by zero-shot classification. However, vision-language models still suffer when they face datasets with large gaps from training ones, i.e., distribution shifts. We found that CLIP is especially vulnerable to sensor degradation, a type of realistic distribution shift caused by sensor conditions such as weather, light, or noise. Collecting a new dataset from a test distribution for fine-tuning highly costs since sensor degradation occurs unexpectedly and has a range of variety. Thus, we investigate test-time adaptation (TTA) of zero-shot classification, which enables on-the-fly adaptation to the test distribution with unlabeled test data. Existing TTA methods for CLIP mainly focus on modifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
MethodsFocus · Contrastive Language-Image Pre-training · Knowledge Distillation
