UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models
Jiachen Liang, Ruibing Hou, Minyang Hu, Hong Chang, Shiguang Shan,, Xilin Chen

TL;DR
This paper introduces UMFC, a training-free, label-free method to calibrate features in vision-language models like CLIP, reducing domain bias and improving zero-shot transfer across multiple domains without additional labeled data.
Contribution
The paper proposes a novel unsupervised, training-free feature calibration technique to mitigate domain bias in CLIP, enhancing its transferability without extra annotations or optimization.
Findings
UMFC effectively reduces domain bias in CLIP's features.
Our method outperforms baseline CLIP in multiple domain transfer tasks.
UMFC achieves comparable results to state-of-the-art methods requiring labels or training.
Abstract
Pre-trained vision-language models (e.g., CLIP) have shown powerful zero-shot transfer capabilities. But they still struggle with domain shifts and typically require labeled data to adapt to downstream tasks, which could be costly. In this work, we aim to leverage unlabeled data that naturally spans multiple domains to enhance the transferability of vision-language models. Under this unsupervised multi-domain setting, we have identified inherent model bias within CLIP, notably in its visual and text encoders. Specifically, we observe that CLIP's visual encoder tends to prioritize encoding domain over discriminative category information, meanwhile its text encoder exhibits a preference for domain-relevant classes. To mitigate this model bias, we propose a training-free and label-free feature calibration method, Unsupervised Multi-domain Feature Calibration (UMFC). UMFC estimates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsContrastive Language-Image Pre-training
