A Unified Structure for Efficient RGB and RGB-D Salient Object Detection
Peng Peng, Yong-Jie Li

TL;DR
This paper introduces a unified, efficient neural network structure with a cross-attention module that effectively handles both RGB and RGB-D salient object detection tasks, outperforming existing methods.
Contribution
The paper presents a novel unified architecture with a cross-attention context extraction module that efficiently fuses RGB and depth information for salient object detection.
Findings
Outperforms state-of-the-art methods on multiple datasets
Effectively fuses RGB and depth data with a unified network
Achieves superior metrics in both RGB and RGB-D SOD tasks
Abstract
Salient object detection (SOD) has been well studied in recent years, especially using deep neural networks. However, SOD with RGB and RGB-D images is usually treated as two different tasks with different network structures that need to be designed specifically. In this paper, we proposed a unified and efficient structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently. The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs. The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries. The proposed structure is simple yet effective; the rich context information of RGB and depth can be appropriately extracted and fused by the proposed structure efficiently. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Face Recognition and Perception
