ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection
Jifeng Shen, Yifei Chen, Yue Liu, Xin Zuo, Heng Fan, Wankou Yang

TL;DR
ICAFusion introduces a dual cross-attention transformer framework with iterative interaction to improve multispectral object detection, addressing misalignment issues and reducing model complexity for better performance.
Contribution
The paper proposes a novel dual cross-attention transformer framework with iterative parameter sharing for multispectral object detection, enhancing feature fusion and reducing complexity.
Findings
Achieves superior detection performance on KAIST, FLIR, and VEDAI datasets.
Provides faster inference compared to existing methods.
Demonstrates robustness to image misalignment.
Abstract
Effective feature fusion of multispectral images plays a crucial role in multi-spectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deffciency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Infrared Target Detection Methodologies · Advanced Image and Video Retrieval Techniques
