Cross-domain Few-shot Object Detection with Multi-modal Textual Enrichment
Zeyu Shangguan, Daniel Seita, Mohammad Rostami

TL;DR
This paper introduces a meta-learning framework that leverages rich textual semantics to improve cross-domain few-shot object detection, effectively addressing domain shift issues by integrating visual and linguistic features.
Contribution
The paper proposes a novel multi-modal architecture with feature aggregation and semantic rectification modules for better domain adaptation in few-shot object detection.
Findings
Significantly outperforms existing few-shot detection methods on benchmarks.
Effective alignment of visual and linguistic features across domains.
Enhanced understanding of language improves detection accuracy.
Abstract
Advancements in cross-modal feature extraction and integration have significantly enhanced performance in few-shot learning tasks. However, current multi-modal object detection (MM-OD) methods often experience notable performance degradation when encountering substantial domain shifts. We propose that incorporating rich textual information can enable the model to establish a more robust knowledge relationship between visual instances and their corresponding language descriptions, thereby mitigating the challenges of domain shift. Specifically, we focus on the problem of Cross-Domain Multi-Modal Few-Shot Object Detection (CDMM-FSOD) and introduce a meta-learning-based framework designed to leverage rich textual semantics as an auxiliary modality to achieve effective domain adaptation. Our new architecture incorporates two key components: (i) A multi-modal feature aggregation module,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsFocus
