TL;DR
This paper introduces MS-DePro, a multi-modal, multi-source domain adaptation method for object detection that uses depth maps and text to improve localization and classification across domains.
Contribution
The paper proposes MS-DePro, a novel approach leveraging depth and multi-modal prompts for improved multi-source domain adaptation in object detection.
Findings
Achieves state-of-the-art results on MSDA benchmarks.
Depth-guided proposals improve localization accuracy.
Multi-modal feature alignment enhances classification performance.
Abstract
General object detection (OD) struggles to detect objects in the target domain that differ from the training distribution. To address this, recent studies demonstrate that training from multiple source domains and explicitly processing them separately for multi-source domain adaptation (MSDA) outperforms blending them for unsupervised domain adaptation (UDA). However, existing MSDA methods learn domain-agnostic features from domain-specific RGB images while preserving domain-specific information from the domain-agnostic feature map. To address this, we propose MS-DePro: Multi-Source Detector with Depth and Prompt, composed of (1) depth-guided localization and (2) multi-modal guided prompt learning. We leverage domain-agnostic input modalities, namely depth maps and text, to encode domain-agnostic characteristics. Specifically, we utilize depth maps to generate domain-agnostic region…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
