Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection
Nian Liu, Ni Zhang, Ling Shao, Junwei Han

TL;DR
This paper introduces a novel mutual attention and contrastive learning framework for RGB-D saliency detection, effectively leveraging high-order cross-modal interactions and selective attention to improve performance on challenging datasets.
Contribution
The paper proposes a new mutual attention model with high-order cross-modal interaction and a contrastive inference mechanism, along with selective attention for depth quality, embedded in a two-stream CNN.
Findings
Demonstrates superior performance on RGB-D SOD benchmarks
Constructs a new large-scale high-quality RGB-D dataset
Shows effectiveness of mutual attention and contrast modules
Abstract
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection. Early fusion and the result fusion schemes fuse RGB and depth information at the input and output stages, respectively, hence incur the problem of distribution gap or information loss. Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods. In this paper, we propose a novel mutual attention model by fusing attention and contexts from different modalities. We use the non-local attention of one modality to propagate long-range contextual dependencies for the other modality, thus leveraging complementary attention cues to perform high-order and trilinear cross-modal interaction. We also propose to induce contrast inference from the mutual attention and obtain a unified model. Considering low-quality depth data may detriment the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Image and Video Quality Assessment
