Semi-Supervised Domain Generalization for Object Detection via Language-Guided Feature Alignment
Sina Malakouti, Adriana Kovashka

TL;DR
This paper introduces a semi-supervised domain generalization method for object detection that leverages vision-language pre-training and language space feature alignment, significantly improving performance across domains.
Contribution
It proposes a novel Cross-Domain Descriptive Multi-Scale Learning approach that aligns image descriptions in the language space to enhance domain generalization in object detection.
Findings
Achieves 11.7% improvement in DG setting
Achieves 7.5% improvement in DA setting
Outperforms existing methods significantly
Abstract
Existing domain adaptation (DA) and generalization (DG) methods in object detection enforce feature alignment in the visual space but face challenges like object appearance variability and scene complexity, which make it difficult to distinguish between objects and achieve accurate detection. In this paper, we are the first to address the problem of semi-supervised domain generalization by exploring vision-language pre-training and enforcing feature alignment through the language space. We employ a novel Cross-Domain Descriptive Multi-Scale Learning (CDDMSL) aiming to maximize the agreement between descriptions of an image presented with different domain-specific characteristics in the embedding space. CDDMSL significantly outperforms existing methods, achieving 11.7% and 7.5% improvement in DG and DA settings, respectively. Comprehensive analysis and ablation studies confirm the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
