Disentangle Object and Non-object Infrared Features via Language Guidance
Fan Liu, Ting Wu, Chuanyi Zhang, Liang Yao, Xing Ma, Yuhui Zheng

TL;DR
This paper introduces a novel vision-language learning framework for infrared object detection that uses textual guidance to disentangle object and non-object features, improving detection accuracy in challenging environments.
Contribution
It proposes a Semantic Feature Alignment and Object Feature Disentanglement modules that leverage textual supervision to enhance infrared object detection.
Findings
Achieves 83.7% mAP on M3FD benchmark.
Achieves 86.1% mAP on FLIR benchmark.
Outperforms existing methods in infrared object detection.
Abstract
Infrared object detection focuses on identifying and locating objects in complex environments (\eg, dark, snow, and rain) where visible imaging cameras are disabled by poor illumination. However, due to low contrast and weak edge information in infrared images, it is challenging to extract discriminative object features for robust detection. To deal with this issue, we propose a novel vision-language representation learning paradigm for infrared object detection. An additional textual supervision with rich semantic information is explored to guide the disentanglement of object and non-object features. Specifically, we propose a Semantic Feature Alignment (SFA) module to align the object features with the corresponding text features. Furthermore, we develop an Object Feature Disentanglement (OFD) module that disentangles text-aligned object features and non-object features by minimizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
