TL;DR
This paper introduces BBS-Net, a novel RGB-D salient object detection model that uses a bifurcated backbone and depth-enhanced modules to improve multi-modal feature fusion, achieving state-of-the-art results.
Contribution
The paper proposes a bifurcated backbone strategy and depth-enhanced modules for improved multi-modal feature fusion in RGB-D salient object detection.
Findings
BBS-Net outperforms 18 SOTA models on 8 datasets.
Achieves approximately 4% improvement in S-measure over top models.
Demonstrates strong generalization across datasets.
Abstract
Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel cascaded refinement network. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is simple, efficient, and backbone-independent.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
