Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection
Chao Zeng, Sam Kwong

TL;DR
This paper introduces a Dual Swin-Transformer based network that effectively models long-range dependencies and leverages cross-modality attention for improved RGB-D salient object detection.
Contribution
It proposes a novel dual Swin-Transformer architecture with attention modules and a multi-stage decoding process for enhanced RGB-D saliency detection.
Findings
Outperforms state-of-the-art methods on five benchmark datasets.
Effectively models long-range dependencies in visual features.
Utilizes attention mechanisms to fuse RGB and depth information.
Abstract
Salient Object Detection is the task of predicting the human attended region in a given scene. Fusing depth information has been proven effective in this task. The main challenge of this problem is how to aggregate the complementary information from RGB modality and depth modality. However, conventional deep models heavily rely on CNN feature extractors, and the long-range contextual dependencies are usually ignored. In this work, we propose Dual Swin-Transformer based Mutual Interactive Network. We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs. Before fusing the two branches of features into one, attention-based modules are applied to enhance features from each modality. We design a self-attention-based cross-modality interaction module and a gated modality attention module to leverage the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Virtual Reality Applications and Impacts
MethodsConvolution · Dense Connections
