HODINet: High-Order Discrepant Interaction Network for RGB-D Salient Object Detection
Kang Yi, Jing Xu, Xiao Jin, Fu Guo, Yan-Feng Wu

TL;DR
HODINet introduces a novel high-order discrepant interaction network that effectively models and fuses RGB and depth features at multiple stages for improved salient object detection.
Contribution
The paper proposes a new high-order interaction framework with specialized fusion modules and a cascaded decoding process for RGB-D SOD, addressing feature discrepancy issues.
Findings
Achieves competitive performance on seven datasets.
Outperforms 24 state-of-the-art methods.
Effective high-order feature fusion improves detection accuracy.
Abstract
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information. Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features. However, these features contribute differently to the final saliency results, which raises two issues: 1) how to model discrepant characteristics of RGB images and depth maps; 2) how to fuse these cross-modality features in different stages. In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD. Concretely, we first employ transformer-based and CNN-based architectures as backbones to encode RGB and depth features, respectively. Then, the high-order representations are delicately extracted and embedded into spatial and channel attentions for cross-modality feature fusion in different stages.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Virtual Reality Applications and Impacts · Face Recognition and Perception
