Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score

Eman Ali; Sathira Silva; Chetan Arora; Muhammad Haris Khan

arXiv:2507.09615·cs.CV·July 15, 2025

Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score

Eman Ali, Sathira Silva, Chetan Arora, Muhammad Haris Khan

PDF

Open Access

TL;DR

This paper introduces FAIR, a novel method for fine-grained unsupervised adaptation of CLIP that dynamically aligns image features with text descriptions, improving pseudo-label accuracy and overall performance.

Contribution

FAIR presents a new adaptive alignment score and interaction refinement technique for better fine-grained adaptation of vision-language models.

Findings

01

Achieves 2.78% overall gain over SOTA on 13 datasets.

02

Improves pseudo-label quality through dynamic cross-modal interactions.

03

Enhances fine-grained classification accuracy in unsupervised settings.

Abstract

Vision-language models (VLMs) like CLIP excel in zero-shot learning by aligning image and text representations through contrastive pretraining. Existing approaches to unsupervised adaptation (UA) for fine-grained classification with VLMs either rely on fixed alignment scores that cannot capture evolving, subtle class distinctions or use computationally expensive pseudo-labeling strategies that limit scalability. In contrast, we show that modeling fine-grained cross-modal interactions during adaptation produces more accurate, class-discriminative pseudo-labels and substantially improves performance over state-of-the-art (SOTA) methods. We introduce Fine-grained Alignment and Interaction Refinement (FAIR), an innovative approach that dynamically aligns localized image features with descriptive language embeddings through a set of Class Description Anchors (CDA). This enables the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Cancer-related molecular mechanisms research