SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection
Shuhan Dong, Yunsong Li, Weiying Xie, Jiaqing Zhang, Jiayuan Tian,, Danian Yang, Jie Lei

TL;DR
SeaDATE introduces a dual attention fusion module and contrastive learning to enhance multimodal object detection, effectively integrating local and global features across network layers for improved accuracy.
Contribution
The paper proposes a novel dual attention feature fusion module guided by Transformers and a contrastive learning approach to better extract deep semantic features in multimodal detection.
Findings
Achieves state-of-the-art performance on FLIR, LLVIP, and M3FD datasets.
Demonstrates the effectiveness of dual attention fusion in integrating modal features.
Validates the superiority of contrastive learning in capturing deep semantic information.
Abstract
Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors. By learning long-term dependencies, Transformer can effectively integrate multimodal features in the feature extraction stage, which greatly improves the performance of multimodal object detection. However, current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network, thus limiting the improvements in detection performance. In this paper, we introduce an accurate and efficient object detection method named SeaDATE. Initially, we propose a novel dual attention Feature Fusion (DTF) module that, under Transformer's guidance, integrates local and global information through a dual attention mechanism, strengthening the fusion of modal features from orthogonal perspectives using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling
MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer
