UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement
Xiao Zhang, Fei Wei, Yong Wang, Wenda Zhao, Feiyi Li, Xiangxiang Chu

TL;DR
UPRE introduces a unified framework that enhances zero-shot domain adaptation for object detection by jointly optimizing prompts and visual representations, effectively addressing domain shifts and task-model misalignments.
Contribution
The paper proposes a novel UPRE framework that combines multi-view domain prompts with visual representation enhancement, improving zero-shot detection across diverse domains.
Findings
Outperforms existing methods on nine benchmark datasets
Effectively aligns multi-modal representations at multiple levels
Demonstrates significant improvements in zero-shot detection accuracy
Abstract
Zero-shot domain adaptation (ZSDA) presents substantial challenges due to the lack of images in the target domain. Previous approaches leverage Vision-Language Models (VLMs) to tackle this challenge, exploiting their zero-shot learning capabilities. However, these methods primarily address domain distribution shifts and overlook the misalignment between the detection task and VLMs, which rely on manually crafted prompts. To overcome these limitations, we propose the unified prompt and representation enhancement (UPRE) framework, which jointly optimizes both textual prompts and visual representations. Specifically, our approach introduces a multi-view domain prompt that combines linguistic domain priors with detection-specific knowledge, and a visual representation enhancement module that produces domain style variations. Furthermore, we introduce multi-level enhancement strategies,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
