Middle-level Fusion for Lightweight RGB-D Salient Object Detection

Nianchang Huang; Qiang Zhang; Jungong Han

arXiv:2104.11543·cs.CV·November 23, 2022·1 cites

Middle-level Fusion for Lightweight RGB-D Salient Object Detection

Nianchang Huang, Qiang Zhang, Jungong Han

PDF

Open Access

TL;DR

This paper introduces a lightweight RGB-D salient object detection model using middle-level fusion, effectively capturing cross-modal information while significantly reducing parameters and maintaining high speed.

Contribution

The paper proposes a novel middle-level fusion structure for RGB-D SOD, improving cross-modal feature exploitation and reducing model complexity compared to existing methods.

Findings

01

Model contains only 3.9M parameters.

02

Runs at 33 FPS, enabling real-time applications.

03

Outperforms state-of-the-art methods on benchmark datasets.

Abstract

Most existing lightweight RGB-D salient object detection (SOD) models are based on two-stream structure or single-stream structure. The former one first uses two sub-networks to extract unimodal features from RGB and depth images, respectively, and then fuses them for SOD. While, the latter one directly extracts multi-modal features from the input RGB-D images and then focuses on exploiting cross-level complementary information. However, two-stream structure based models inevitably require more parameters and single-stream structure based ones cannot well exploit the cross-modal complementary information since they ignore the modality difference. To address these issues, we propose to employ the middle-level fusion structure for designing lightweight RGB-D SOD model in this paper, which first employs two sub-networks to extract low- and middle-level unimodal features, respectively, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications