Transformer-based Network for RGB-D Saliency Detection

Yue Wang; Xu Jia; Lu Zhang; Yuke Li; James Elder; Huchuan Lu

arXiv:2112.00582·cs.CV·December 2, 2021·6 cites

Transformer-based Network for RGB-D Saliency Detection

Yue Wang, Xu Jia, Lu Zhang, Yuke Li, James Elder, Huchuan Lu

PDF

Open Access

TL;DR

This paper introduces a transformer-based network for RGB-D saliency detection that effectively captures long-range dependencies and fuses multi-scale, multi-modal features, outperforming existing methods on benchmark datasets.

Contribution

The paper proposes a novel transformer-based architecture with modules for feature enhancement and fusion, simplifying design and improving performance in RGB-D saliency detection.

Findings

01

Outperforms state-of-the-art methods on six benchmark datasets

02

Effectively captures long-range dependencies in feature fusion

03

Simplifies model design using transformer operations

Abstract

RGB-D saliency detection integrates information from both RGB images and depth maps to improve prediction of salient regions under challenging conditions. The key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities. Previous approaches tend to apply the multi-scale and multi-modal fusion separately via local operations, which fails to capture long-range dependencies. Here we propose a transformer-based network to address this issue. Our proposed architecture is composed of two modules: a transformer-based within-modality feature enhancement module (TWFEM) and a transformer-based feature fusion module (TFFM). TFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously. TWFEM enhances feature on each scale by selecting and integrating complementary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques