SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification
Xingzhao Jia, Dongye Changlei, Yanjun Peng

TL;DR
SiaTrans introduces a Siamese transformer network for RGB-D salient object detection that classifies depth image quality during training, enabling adaptive fusion during testing and achieving superior performance with reduced computation.
Contribution
The paper proposes a novel transformer-based RGB-D SOD model that incorporates depth image quality classification to improve fusion and prediction accuracy.
Findings
Achieves state-of-the-art performance on nine RGB-D SOD benchmarks.
Effectively classifies depth image quality to adapt fusion during testing.
Reduces computational complexity compared to existing methods.
Abstract
RGB-D SOD uses depth information to handle challenging scenes and obtain high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection methods overwhelmingly rely on the strategy of directly fusing depth information. Although these methods improve the accuracy of saliency prediction through various cross-modality fusion strategies, misinformation provided by some poor-quality depth images can affect the saliency prediction result. To address this issue, a novel RGB-D salient object detection model (SiaTrans) is proposed in this paper, which allows training on depth image quality classification at the same time as training on SOD. In light of the common information between RGB and depth images on salient objects, SiaTrans uses a Siamese transformer network with shared weight parameters as the encoder and extracts RGB and depth features concatenated on the batch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image Fusion Techniques
