From Words to Wavelengths: VLMs for Few-Shot Multispectral Object Detection
Manuel Nkegoum, Minh-Tan Pham, \'Elisa Fromont, Bruno Avignon, S\'ebastien Lef\`evre

TL;DR
This paper investigates the use of Vision-Language Models (VLMs) for few-shot multispectral object detection, demonstrating their effectiveness in data-scarce scenarios and their ability to transfer semantic knowledge across spectral modalities.
Contribution
It adapts two VLM-based detectors for multispectral inputs and introduces a mechanism to integrate multiple modalities, advancing data-efficient multispectral perception.
Findings
VLM-based detectors outperform specialized models in few-shot settings
They achieve competitive results in fully supervised scenarios
Semantic priors from VLMs transfer effectively across spectral modalities
Abstract
Multispectral object detection is critical for safety-sensitive applications such as autonomous driving and surveillance, where robust perception under diverse illumination conditions is essential. However, the limited availability of annotated multispectral data severely restricts the training of deep detectors. In such data-scarce scenarios, textual class information can serve as a valuable source of semantic supervision. Motivated by the recent success of Vision-Language Models (VLMs) in computer vision, we explore their potential for few-shot multispectral object detection. Specifically, we adapt two representative VLM-based detectors, Grounding DINO and YOLO-World, to handle multispectral inputs and propose an effective mechanism to integrate text, visual and thermal modalities. Through extensive experiments on two popular multispectral image benchmarks, FLIR and M3FD, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
