Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection
Xiurong Jiang, Lin Zhu, Yifan Hou, Hui Tian

TL;DR
This paper introduces a mirror complementary Transformer network (MCNet) for RGB-thermal salient object detection, effectively leveraging hierarchical features and attention mechanisms to improve robustness in challenging scenes, even when one modality fails.
Contribution
The paper proposes a novel Transformer-based model with a mirror structure for robust RGB-T SOD, incorporating attention-based feature interaction and multiscale dilated convolution for improved performance.
Findings
Outperforms state-of-the-art CNN and Transformer methods on benchmark datasets.
Demonstrates robustness in challenging scenes like nighttime and complex backgrounds.
Introduces a new RGB-T SOD dataset VT723 for real-world testing.
Abstract
RGB-thermal salient object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair and accurately segment all the pixels belonging to those objects. It is promising in challenging scenes such as nighttime and complex backgrounds due to the insensitivity to lighting conditions of thermal images. Thus, the key problem of RGB-T SOD is to make the features from the two modalities complement and adjust each other flexibly, since it is inevitable that any modalities of RGB-T image pairs failure due to challenging scenes such as extreme light conditions and thermal crossover. In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD. Specifically, we introduce a Transformer-based feature extraction module to effective extract hierarchical features of RGB and thermal images. Then, through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Infrared Target Detection Methodologies
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Absolute Position Encodings · Dropout · Byte Pair Encoding · Adam
