Multi-Modal Guided Multi-Source Domain Adaptation for Object Detection

Sangin Lee; Seokjun Kwon; Jeongmin Shin; Namil Kim; Yukyung Choi

arXiv:2605.13140·cs.CV·May 14, 2026

Multi-Modal Guided Multi-Source Domain Adaptation for Object Detection

Sangin Lee, Seokjun Kwon, Jeongmin Shin, Namil Kim, Yukyung Choi

PDF

1 Repo

TL;DR

This paper introduces MS-DePro, a multi-modal, multi-source domain adaptation method for object detection that uses depth maps and text to improve localization and classification across domains.

Contribution

The paper proposes MS-DePro, a novel approach leveraging depth and multi-modal prompts for improved multi-source domain adaptation in object detection.

Findings

01

Achieves state-of-the-art results on MSDA benchmarks.

02

Depth-guided proposals improve localization accuracy.

03

Multi-modal feature alignment enhances classification performance.

Abstract

General object detection (OD) struggles to detect objects in the target domain that differ from the training distribution. To address this, recent studies demonstrate that training from multiple source domains and explicitly processing them separately for multi-source domain adaptation (MSDA) outperforms blending them for unsupervised domain adaptation (UDA). However, existing MSDA methods learn domain-agnostic features from domain-specific RGB images while preserving domain-specific information from the domain-agnostic feature map. To address this, we propose MS-DePro: Multi-Source Detector with Depth and Prompt, composed of (1) depth-guided localization and (2) multi-modal guided prompt learning. We leverage domain-agnostic input modalities, namely depth maps and text, to encode domain-agnostic characteristics. Specifically, we utilize depth maps to generate domain-agnostic region…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sejong-rcv/Multi-Modal-Guided-Multi-Source-Domain-Adaptation-for-Object-Detection
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.