Leveraging Language Prior for Infrared Small Target Detection
Pranav Singh, Pravendra Singh

TL;DR
This paper introduces a multimodal infrared small target detection framework that incorporates language priors and leverages GPT-4 vision models to improve detection accuracy, addressing dataset limitations and enhancing performance over existing methods.
Contribution
It proposes a novel multimodal IRSTD approach using language priors and creates a new dataset combining image and text modalities for small target detection.
Findings
Significant improvements in IoU, nIoU, Pd, and Fa metrics over state-of-the-art methods.
Effective use of GPT-4 vision model for generating text-based target location descriptions.
Validation through extensive experiments and ablation studies.
Abstract
IRSTD (InfraRed Small Target Detection) detects small targets in infrared blurry backgrounds and is essential for various applications. The detection task is challenging due to the small size of the targets and their sparse distribution in infrared small target datasets. Although existing IRSTD methods and datasets have led to significant advancements, they are limited by their reliance solely on the image modality. Recent advances in deep learning and large vision-language models have shown remarkable performance in various visual recognition tasks. In this work, we propose a novel multimodal IRSTD framework that incorporates language priors to guide small target detection. We leverage language-guided attention weights derived from the language prior to enhance the model's ability for IRSTD, presenting a novel approach that combines textual information with image data to improve IRSTD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies
