Mirror Complementary Transformer Network for RGB-thermal Salient Object   Detection

Xiurong Jiang; Lin Zhu; Yifan Hou; Hui Tian

arXiv:2207.03558·cs.CV·July 11, 2022

Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection

Xiurong Jiang, Lin Zhu, Yifan Hou, Hui Tian

PDF

Open Access 1 Repo

TL;DR

This paper introduces a mirror complementary Transformer network (MCNet) for RGB-thermal salient object detection, effectively leveraging hierarchical features and attention mechanisms to improve robustness in challenging scenes, even when one modality fails.

Contribution

The paper proposes a novel Transformer-based model with a mirror structure for robust RGB-T SOD, incorporating attention-based feature interaction and multiscale dilated convolution for improved performance.

Findings

01

Outperforms state-of-the-art CNN and Transformer methods on benchmark datasets.

02

Demonstrates robustness in challenging scenes like nighttime and complex backgrounds.

03

Introduces a new RGB-T SOD dataset VT723 for real-world testing.

Abstract

RGB-thermal salient object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair and accurately segment all the pixels belonging to those objects. It is promising in challenging scenes such as nighttime and complex backgrounds due to the insensitivity to lighting conditions of thermal images. Thus, the key problem of RGB-T SOD is to make the features from the two modalities complement and adjust each other flexibly, since it is inevitable that any modalities of RGB-T image pairs failure due to challenging scenes such as extreme light conditions and thermal crossover. In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD. Specifically, we introduce a Transformer-based feature extraction module to effective extract hierarchical features of RGB and thermal images. Then, through the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jxr326/swinmcnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Infrared Target Detection Methodologies

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Absolute Position Encodings · Dropout · Byte Pair Encoding · Adam