UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement

Xiao Zhang; Fei Wei; Yong Wang; Wenda Zhao; Feiyi Li; Xiangxiang Chu

arXiv:2507.00721·cs.CV·July 22, 2025

UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement

Xiao Zhang, Fei Wei, Yong Wang, Wenda Zhao, Feiyi Li, Xiangxiang Chu

PDF

Open Access

TL;DR

UPRE introduces a unified framework that enhances zero-shot domain adaptation for object detection by jointly optimizing prompts and visual representations, effectively addressing domain shifts and task-model misalignments.

Contribution

The paper proposes a novel UPRE framework that combines multi-view domain prompts with visual representation enhancement, improving zero-shot detection across diverse domains.

Findings

01

Outperforms existing methods on nine benchmark datasets

02

Effectively aligns multi-modal representations at multiple levels

03

Demonstrates significant improvements in zero-shot detection accuracy

Abstract

Zero-shot domain adaptation (ZSDA) presents substantial challenges due to the lack of images in the target domain. Previous approaches leverage Vision-Language Models (VLMs) to tackle this challenge, exploiting their zero-shot learning capabilities. However, these methods primarily address domain distribution shifts and overlook the misalignment between the detection task and VLMs, which rely on manually crafted prompts. To overcome these limitations, we propose the unified prompt and representation enhancement (UPRE) framework, which jointly optimizes both textual prompts and visual representations. Specifically, our approach introduces a multi-view domain prompt that combines linguistic domain priors with detection-specific knowledge, and a visual representation enhancement module that produces domain style variations. Furthermore, we introduce multi-level enhancement strategies,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis