Semantic Relation-Enhanced CLIP Adapter for Domain Adaptive Zero-Shot Learning

Jiaao Yu; Mingjie Han; Jinkun Jiang; Junyu Dong; Tao Gong; Man Lan

arXiv:2510.21808·cs.CV·October 28, 2025

Semantic Relation-Enhanced CLIP Adapter for Domain Adaptive Zero-Shot Learning

Jiaao Yu, Mingjie Han, Jinkun Jiang, Junyu Dong, Tao Gong, Man Lan

PDF

TL;DR

This paper introduces SRE-CLIP, a novel framework that enhances CLIP for domain adaptive zero-shot learning by leveraging semantic relations and maintaining cross-modal alignment, achieving state-of-the-art results.

Contribution

It is the first CLIP-based DAZSL method that incorporates semantic relation guidance and alignment retention to improve transfer and generalization.

Findings

01

Achieves state-of-the-art performance on I2AwA and I2WebV benchmarks.

02

Significantly outperforms existing DAZSL approaches.

03

Effectively balances cross-domain transfer and cross-category generalization.

Abstract

The high cost of data annotation has spurred research on training deep learning models in data-limited scenarios. Existing paradigms, however, fail to balance cross-domain transfer and cross-category generalization, giving rise to the demand for Domain-Adaptive Zero-Shot Learning (DAZSL). Although vision-language models (e.g., CLIP) have inherent advantages in the DAZSL field, current studies do not fully exploit their potential. Applying CLIP to DAZSL faces two core challenges: inefficient cross-category knowledge transfer due to the lack of semantic relation guidance, and degraded cross-modal alignment during target domain fine-tuning. To address these issues, we propose a Semantic Relation-Enhanced CLIP (SRE-CLIP) Adapter framework, integrating a Semantic Relation Structure Loss and a Cross-Modal Alignment Retention Strategy. As the first CLIP-based DAZSL method, SRE-CLIP achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.