CLAIR: CLIP-Aided Weakly Supervised Zero-Shot Cross-Domain Image Retrieval

Chor Boon Tan; Conghui Hu; Gim Hee Lee

arXiv:2508.12290·cs.CV·August 19, 2025

CLAIR: CLIP-Aided Weakly Supervised Zero-Shot Cross-Domain Image Retrieval

Chor Boon Tan, Conghui Hu, Gim Hee Lee

PDF

Open Access

TL;DR

CLAIR leverages CLIP-generated pseudo-labels and contrastive learning to improve weakly supervised zero-shot cross-domain image retrieval, effectively handling noisy labels and domain discrepancies.

Contribution

This paper introduces CLAIR, a novel framework that refines pseudo-labels with confidence scores, employs contrastive losses, and uses a cross-domain mapping with learnable prompts to enhance zero-shot image retrieval.

Findings

01

CLAIR outperforms existing methods on multiple zero-shot datasets.

02

The confidence-based pseudo-label refinement improves retrieval accuracy.

03

Cross-domain mapping with CLIP embeddings effectively reduces domain gaps.

Abstract

The recent growth of large foundation models that can easily generate pseudo-labels for huge quantity of unlabeled data makes unsupervised Zero-Shot Cross-Domain Image Retrieval (UZS-CDIR) less relevant. In this paper, we therefore turn our attention to weakly supervised ZS-CDIR (WSZS-CDIR) with noisy pseudo labels generated by large foundation models such as CLIP. To this end, we propose CLAIR to refine the noisy pseudo-labels with a confidence score from the similarity between the CLIP text and image features. Furthermore, we design inter-instance and inter-cluster contrastive losses to encode images into a class-aware latent space, and an inter-domain contrastive loss to alleviate domain discrepancies. We also learn a novel cross-domain mapping function in closed-form, using only CLIP text embeddings to project image features from one domain to another, thereby further aligning the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications