Point-aware Interaction and CNN-induced Refinement Network for RGB-D   Salient Object Detection

Runmin Cong; Hongyu Liu; Chen Zhang; Wei Zhang; Feng Zheng; Ran Song,; and Sam Kwong

arXiv:2308.08930·cs.CV·December 10, 2024·6 cites

Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection

Runmin Cong, Hongyu Liu, Chen Zhang, Wei Zhang, Feng Zheng, Ran Song,, and Sam Kwong

PDF

Open Access 2 Repos

TL;DR

This paper proposes a novel RGB-D salient object detection network that combines Transformer-based cross-modality interaction with CNN-based refinement to improve detection accuracy in complex scenes.

Contribution

It introduces a point-aware interaction module for cross-modality feature exploration and a CNN-induced refinement unit to address Transformer limitations.

Findings

01

Achieves competitive results on five RGB-D SOD datasets.

02

Effectively models global dependencies and local details.

03

Enhances detection performance in challenging scenes.

Abstract

By integrating complementary information from RGB image and depth map, the ability of salient object detection (SOD) for complex and challenging scenes can be improved. In recent years, the important role of Convolutional Neural Networks (CNNs) in feature extraction and cross-modality interaction has been fully explored, but it is still insufficient in modeling global long-range dependencies of self-modality and cross-modality. To this end, we introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement (PICR-Net). On the one hand, considering the prior correlation between RGB modality and depth modality, an attention-triggered cross-modality point-aware interaction (CmPI) module is designed to explore the feature interaction of different modalities with positional constraints. On the other hand, in order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Virtual Reality Applications and Impacts · Face Recognition and Perception

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Residual Connection