RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision
Jinzhong Wang, Xuetao Tian, Shun Dai, Tao Zhuo, Haorui, Zeng, Hongjuan Liu, Jiaqi Liu, Xiuwei Zhang, Yanning Zhang

TL;DR
This paper introduces a simple yet effective multi-modal attention module and supervision strategy to improve RGB-T object detection, achieving state-of-the-art accuracy on challenging benchmarks with high efficiency.
Contribution
The paper proposes a novel Group Shuffled Multi-receptive Attention module and a multi-modal supervision method for enhanced RGB-T object detection.
Findings
Achieves state-of-the-art accuracy on KAIST and DroneVehicle benchmarks.
Maintains high efficiency while improving detection performance.
Effectively exploits complementarity between RGB and thermal modalities.
Abstract
Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Attention (GSMA) module is proposed to extract and combine multi-scale RGB and thermal features. Then, the extracted multi-modal features are directly integrated with a multi-level path aggregation neck, which significantly improves the fusion effect and efficiency. Meanwhile, multi-modal object detection often adopts union annotations for both modals. This kind of supervision is not sufficient and unfair, since objects observed in one modal may not be seen in the other modal. To solve this issue,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Industrial Vision Systems and Defect Detection
