CLIP Adaptation by Intra-modal Overlap Reduction

Alexey Kravets; Vinay Namboodiri

arXiv:2409.11338·cs.CV·September 18, 2024

CLIP Adaptation by Intra-modal Overlap Reduction

Alexey Kravets, Vinay Namboodiri

PDF

Open Access

TL;DR

This paper analyzes the intra-modal overlap in CLIP's image embeddings and proposes a lightweight adapter to reduce this overlap, leading to improved few-shot classification accuracy, robustness, and feature discriminability.

Contribution

It introduces a novel intra-modal overlap reduction method via a lightweight adapter, enhancing CLIP's few-shot classification performance and robustness.

Findings

01

Reduced intra-modal overlap improves classification accuracy

02

Enhanced robustness to distribution shifts

03

Features become more discriminative for downstream tasks

Abstract

Numerous methods have been proposed to adapt a pre-trained foundational CLIP model for few-shot classification. As CLIP is trained on a large corpus, it generalises well through adaptation to few-shot classification. In this work, we analyse the intra-modal overlap in image space in terms of embedding representation. Our analysis shows that, due to contrastive learning, embeddings from CLIP model exhibit high cosine similarity distribution overlap in the image space between paired and unpaired examples affecting the performance of few-shot training-free classification methods which rely on similarity in the image space for their predictions. To tackle intra-modal overlap we propose to train a lightweight adapter on a generic set of samples from the Google Open Images dataset demonstrating that this improves accuracy for few-shot training-free classification. We validate our contribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media

MethodsSparse Evolutionary Training · Adapter · Contrastive Language-Image Pre-training