$\mathbf{C}^2$Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection
Maoxun Yuan, Xingxing Wei

TL;DR
This paper introduces $ ext{C}^2$Former, a transformer-based approach that calibrates and fuses RGB and IR images for robust object detection, addressing modality miscalibration and fusion imprecision.
Contribution
The paper proposes a novel $ ext{C}^2$Former with an Inter-modality Cross-Attention module and an Adaptive Feature Sampling module for improved RGB-IR object detection.
Findings
Achieves robust detection on DroneVehicle and KAIST datasets.
Effectively utilizes RGB-IR complementary information.
Compatible with existing object detectors.
Abstract
Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called Former to address these two problems simultaneously. In Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Infrared Target Detection Methodologies
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Absolute Position Encodings · Label Smoothing · Dense Connections · Adam · Byte Pair Encoding · Residual Connection · Softmax
