TL;DR
This paper introduces SPNet, a novel RGB-D saliency detection framework that explicitly preserves modality-specific features while effectively integrating shared information, leading to improved detection performance.
Contribution
The study proposes a new framework with modality-specific networks and shared learning, along with novel modules for cross-modal feature fusion and aggregation, enhancing RGB-D saliency detection.
Findings
Outperforms state-of-the-art methods on six RGB-D SOD benchmarks.
Achieves superior results on three camouflaged object detection benchmarks.
Effectively preserves modality-specific features while integrating shared information.
Abstract
Salient object detection (SOD) on RGB and depth images has attracted more and more research interests, due to its effectiveness and the fact that depth cues can now be conveniently captured. Existing RGB-D SOD models usually adopt different fusion strategies to learn a shared representation from the two modalities (\ie, RGB and depth), while few methods explicitly consider how to preserve modality-specific characteristics. In this study, we propose a novel framework, termed SPNet} (Specificity-preserving network), which benefits SOD performance by exploring both the shared information and modality-specific properties (\eg, specificity). Specifically, we propose to adopt two modality-specific networks and a shared learning network to generate individual and shared saliency prediction maps, respectively. To effectively fuse cross-modal features in the shared learning network, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
