UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models   for Universal Cross-Domain Retrieval

Haoyu Jiang; Zhi-Qi Cheng; Gabriel Moreira; Jiawen Zhu; Jingdong Sun,; Bukun Ren; Jun-Yan He; Qi Dai; and Xian-Sheng Hua

arXiv:2412.10680·cs.CV·December 17, 2024

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval

Haoyu Jiang, Zhi-Qi Cheng, Gabriel Moreira, Jiawen Zhu, Jingdong Sun,, Bukun Ren, Jun-Yan He, Qi Dai, and Xian-Sheng Hua

PDF

Open Access 1 Repo

TL;DR

UCDR-Adapter introduces a novel method combining adapters and dynamic prompt generation to improve universal cross-domain retrieval, enabling better adaptation to unseen domains and classes without relying on semantic labels.

Contribution

It proposes a two-phase training strategy with source adapter learning and target prompt generation, enhancing pre-trained vision-language models for flexible, zero-shot retrieval across diverse domains.

Findings

01

Outperforms state-of-the-art methods on multiple UCDR benchmarks

02

Demonstrates robust generalization to unseen domains and classes

03

Achieves higher retrieval accuracy with dynamic prompts during inference

Abstract

Universal Cross-Domain Retrieval (UCDR) retrieves relevant images from unseen domains and classes without semantic labels, ensuring robust generalization. Existing methods commonly employ prompt tuning with pre-trained vision-language models but are inherently limited by static prompts, reducing adaptability. We propose UCDR-Adapter, which enhances pre-trained models with adapters and dynamic prompt generation through a two-phase training strategy. First, Source Adapter Learning integrates class semantics with domain-specific visual knowledge using a Learnable Textual Semantic Template and optimizes Class and Domain Prompts via momentum updates and dual loss functions for robust alignment. Second, Target Prompt Generation creates dynamic prompts by attending to masked source prompts, enabling seamless adaptation to unseen domains and classes. Unlike prior approaches, UCDR-Adapter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fine68/ucdr2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsAdapter