DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models
Eman Ali, Sathira Silva, Muhammad Haris Khan

TL;DR
DPA is an unsupervised domain adaptation method for vision-language models that uses dual prototypes and alignment techniques to improve performance on new tasks without labeled data.
Contribution
It introduces dual prototypes and prototype alignment to enhance unsupervised adaptation of vision-language models like CLIP.
Findings
DPA outperforms zero-shot CLIP on 13 vision tasks.
DPA achieves significant improvements over existing unsupervised adaptation methods.
The method effectively reduces misalignment between visual and textual representations.
Abstract
Vision-language models (VLMs), e.g., CLIP, have shown remarkable potential in zero-shot image classification. However, adapting these models to new domains remains challenging, especially in unsupervised settings where labeled data is unavailable. Recent research has proposed pseudo-labeling approaches to adapt CLIP in an unsupervised manner using unlabeled target data. Nonetheless, these methods struggle due to noisy pseudo-labels resulting from the misalignment between CLIP's visual and textual representations. This study introduces DPA, an unsupervised domain adaptation method for VLMs. DPA introduces the concept of dual prototypes, acting as distinct classifiers, along with the convex combination of their outputs, thereby leading to accurate pseudo-label construction. Next, it ranks pseudo-labels to facilitate robust self-training, particularly during early training. Finally, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
MethodsContrastive Language-Image Pre-training
