Siamese Network for RGB-D Salient Object Detection and Beyond
Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao, Jianbing Shen, Ce Zhu

TL;DR
This paper introduces a Siamese network architecture with joint learning and dense fusion for RGB-D salient object detection, improving robustness and generalization over existing methods.
Contribution
It proposes a novel joint learning and dense fusion framework that leverages shared features for RGB-D SOD, outperforming previous models and extending to other multi-modal detection tasks.
Findings
Achieves ~2.0% improvement in max F-measure over state-of-the-art models.
Demonstrates robustness and good generalization across seven datasets.
Extends applicability to RGB-T and video SOD, outperforming existing methods.
Abstract
Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture. In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Olfactory and Sensory Function Studies · Image and Video Quality Assessment
