DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection
Junjie Guo, Chenqiang Gao, Fangcen Liu, Deyu Meng

TL;DR
DPDETR introduces a novel transformer-based approach for infrared-visible object detection that explicitly models object and modality positions, effectively handling misalignment issues and improving detection accuracy.
Contribution
The paper proposes a decoupled position detection transformer with a multispectral cross-attention module and a decoupled decoder, addressing modality misalignment in infrared-visible object detection.
Findings
Significant performance improvements on DroneVehicle and KAIST datasets.
Effective handling of modality misalignment through decoupled position modeling.
Enhanced learning of intrinsic object relationships with the proposed strategies.
Abstract
Infrared-visible object detection aims to achieve robust object detection by leveraging the complementary information of infrared and visible image pairs. However, the commonly existing modality misalignment problem presents two challenges: fusing misalignment complementary features is difficult, and current methods cannot reliably locate objects in both modalities under misalignment conditions. In this paper, we propose a Decoupled Position Detection Transformer (DPDETR) to address these issues. Specifically, we explicitly define the object category, visible modality position, and infrared modality position to enable the network to learn the intrinsic relationships and output reliably positions of objects in both modalities. To fuse misaligned object features reliably, we propose a Decoupled Position Multispectral Cross-attention module that adaptively samples and aggregates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Infrared Target Detection Methodologies
MethodsLinear Layer · Concatenated Skip Connection · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings
