GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised   Learning

Mainak Singha; Ankit Jha; Biplab Banerjee

arXiv:2308.11605·cs.CV·August 23, 2023·1 cites

GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised Learning

Mainak Singha, Ankit Jha, Biplab Banerjee

PDF

Open Access 1 Repo

TL;DR

GOPro introduces a unified prompt learning framework that combines CLIP and self-supervised learning to improve domain generalization in visual recognition tasks, addressing multi-task challenges.

Contribution

It proposes a novel prompt learning model with a shared embedding space, incorporating multiple loss functions, to enhance invariance and generalizability in CLIP-based models.

Findings

01

Outperforms state-of-the-art prompting methods on multiple benchmarks

02

Demonstrates significant improvements in domain generalization tasks

03

Effectively combines CLIP with self-supervised learning for robust visual recognition

Abstract

Large-scale foundation models, such as CLIP, have demonstrated remarkable success in visual recognition tasks by embedding images in a semantically rich space. Self-supervised learning (SSL) has also shown promise in improving visual recognition by learning invariant features. However, the combination of CLIP with SSL is found to face challenges due to the multi-task framework that blends CLIP's contrastive loss and SSL's loss, including difficulties with loss weighting and inconsistency among different views of images in CLIP's output space. To overcome these challenges, we propose a prompt learning-based model called GOPro, which is a unified framework that ensures similarity between various augmented views of input images in a shared image-text embedding space, using a pair of learnable image and text projectors atop CLIP, to promote invariance and generalizability. To automatically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mainaksingha01/gopro
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training