DPA: Dual Prototypes Alignment for Unsupervised Adaptation of   Vision-Language Models

Eman Ali; Sathira Silva; Muhammad Haris Khan

arXiv:2408.08855·cs.CV·December 3, 2024

DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models

Eman Ali, Sathira Silva, Muhammad Haris Khan

PDF

Open Access 1 Repo

TL;DR

DPA is an unsupervised domain adaptation method for vision-language models that uses dual prototypes and alignment techniques to improve performance on new tasks without labeled data.

Contribution

It introduces dual prototypes and prototype alignment to enhance unsupervised adaptation of vision-language models like CLIP.

Findings

01

DPA outperforms zero-shot CLIP on 13 vision tasks.

02

DPA achieves significant improvements over existing unsupervised adaptation methods.

03

The method effectively reduces misalignment between visual and textual representations.

Abstract

Vision-language models (VLMs), e.g., CLIP, have shown remarkable potential in zero-shot image classification. However, adapting these models to new domains remains challenging, especially in unsupervised settings where labeled data is unavailable. Recent research has proposed pseudo-labeling approaches to adapt CLIP in an unsupervised manner using unlabeled target data. Nonetheless, these methods struggle due to noisy pseudo-labels resulting from the misalignment between CLIP's visual and textual representations. This study introduces DPA, an unsupervised domain adaptation method for VLMs. DPA introduces the concept of dual prototypes, acting as distinct classifiers, along with the convex combination of their outputs, thereby leading to accurate pseudo-label construction. Next, it ranks pseudo-labels to facilitate robust self-training, particularly during early training. Finally, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Externalhappy/DPA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training