Learning Multi-Modal Prototypes for Cross-Domain Few-Shot Object Detection

Wanqi Wang; Jingcai Guo; Yuxiang Cai; Zhi Chen

arXiv:2602.18811·cs.CV·February 24, 2026

Learning Multi-Modal Prototypes for Cross-Domain Few-Shot Object Detection

Wanqi Wang, Jingcai Guo, Yuxiang Cai, Zhi Chen

PDF

Open Access

TL;DR

This paper introduces LMP, a dual-branch detector that combines textual guidance with visual prototypes to improve cross-domain few-shot object detection, achieving state-of-the-art results.

Contribution

It proposes a novel multi-modal prototype learning framework that integrates visual exemplars and text prompts for enhanced detection in unseen domains with few examples.

Findings

01

Achieves state-of-the-art mAP on six cross-domain benchmarks.

02

Effectively combines visual and textual information for detection.

03

Performs well across 1/5/10-shot settings.

Abstract

Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to detect novel classes in unseen target domains given only a few labeled examples. While open-vocabulary detectors built on vision-language models (VLMs) transfer well, they depend almost entirely on text prompts, which encode domain-invariant semantics but miss domain-specific visual information needed for precise localization under few-shot supervision. We propose a dual-branch detector that Learns Multi-modal Prototypes, dubbed LMP, by coupling textual guidance with visual exemplars drawn from the target domain. A Visual Prototype Construction module aggregates class-level prototypes from support RoIs and dynamically generates hard-negative prototypes in query images via jittered boxes, capturing distractors and visually similar backgrounds. In the visual-guided branch, we inject these prototypes into the detection pipeline with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications