Middle-level Fusion for Lightweight RGB-D Salient Object Detection
Nianchang Huang, Qiang Zhang, Jungong Han

TL;DR
This paper introduces a lightweight RGB-D salient object detection model using middle-level fusion, effectively capturing cross-modal information while significantly reducing parameters and maintaining high speed.
Contribution
The paper proposes a novel middle-level fusion structure for RGB-D SOD, improving cross-modal feature exploitation and reducing model complexity compared to existing methods.
Findings
Model contains only 3.9M parameters.
Runs at 33 FPS, enabling real-time applications.
Outperforms state-of-the-art methods on benchmark datasets.
Abstract
Most existing lightweight RGB-D salient object detection (SOD) models are based on two-stream structure or single-stream structure. The former one first uses two sub-networks to extract unimodal features from RGB and depth images, respectively, and then fuses them for SOD. While, the latter one directly extracts multi-modal features from the input RGB-D images and then focuses on exploiting cross-level complementary information. However, two-stream structure based models inevitably require more parameters and single-stream structure based ones cannot well exploit the cross-modal complementary information since they ignore the modality difference. To address these issues, we propose to employ the middle-level fusion structure for designing lightweight RGB-D SOD model in this paper, which first employs two sub-networks to extract low- and middle-level unimodal features, respectively, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
