FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object Detection
Wencong Wu, Xiuwei Zhang, Hanlin Yin, Shun Dai, Hongxi Zhang, Yanning Zhang

TL;DR
FreDFT introduces a frequency domain transformer for improved visible-infrared object detection, effectively addressing information imbalance and enhancing cross-modal feature fusion in complex scenarios.
Contribution
It proposes a novel frequency domain fusion transformer with multimodal attention, cross-modal global modeling, and local feature enhancement modules for superior detection performance.
Findings
Achieves state-of-the-art results on multiple datasets.
Effectively mitigates information imbalance in multimodal data.
Enhances feature fusion through frequency domain attention.
Abstract
Visible-infrared object detection has gained sufficient attention due to its detection performance in low light, fog, and rain conditions. However, visible and infrared modalities captured by different sensors exist the information imbalance problem in complex scenarios, which can cause inadequate cross-modal fusion, resulting in degraded detection performance. \textcolor{red}{Furthermore, most existing methods use transformers in the spatial domain to capture complementary features, ignoring the advantages of developing frequency domain transformers to mine complementary information.} To solve these weaknesses, we propose a frequency domain fusion transformer, called FreDFT, for visible-infrared object detection. The proposed approach employs a novel multimodal frequency domain attention (MFDA) to mine complementary information between modalities and a frequency domain feed-forward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Infrared Target Detection Methodologies · Image Enhancement Techniques
