# Cross-modal interactive and global awareness fusion network for RGB-D salient object detection

**Authors:** Runqing Li, Ling Yu, Zijian Jiang, Fanglin Niu

PMC · DOI: 10.1371/journal.pone.0325301 · PLOS One · 2025-06-12

## TL;DR

This paper introduces CIGNet, a new network for detecting salient objects in RGB-D images that improves accuracy in complex scenes.

## Contribution

The novel CIGNet uses cross-modal interaction and global awareness fusion to enhance RGB-D salient object detection.

## Key findings

- CIGNet outperforms 12 mainstream methods on six benchmark datasets.
- The proposed fusion modules improve detection of small and multiple objects in complex scenes.

## Abstract

The RGB-D salient object detection technique has garnered significant attention in recent years due to its excellent performance. It outperforms salient object detection methods that rely solely on RGB images by leveraging the geometric morphology and spatial layout information from depth images. However, the existing RGB-D detection model still encounters difficulties in accurately recognising and highlighting salient objects when facing complex scenes containing multiple or small objects. In this study, a Cross-modal Interactive and Global Awareness Fusion Network for RGB-D Salient Object Detection, named CIGNet, is proposed. Specifically, convolutional neural networks (CNNs), which are good at extracting local details, and an attention mechanism, which efficiently integrates global information, are utilized to design two fusion methods for RGB and depth images. One of these methods, the Cross-modal Interaction Fusion Module (CIFM), employs depth separable convolution and common-dimensional dynamic convolution to extract rich edge contours and texture details from low-level features. The Global Awareness Fusion Module (GAFM) is designed to relate high-level features between RGB and depth features so as to improve the model’s understanding of complex scenes. In addition, prediction mapping is generated through a step-by-step decoding process carried out by the Multi-layer Convolutional Fusion Module (MCFM), which gradually yields finer detection results. Finally, comparing 12 mainstream methods on six public benchmark datasets demonstrates superior robustness and accuracy.

## Full-text entities

- **Genes:** SOD1 (superoxide dismutase 1) [NCBI Gene 6647] {aka ALS, ALS1, HEL-S-44, IPOA, SOD, STAHP}
- **Chemicals:** IOU (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** MCFM — Homo sapiens (Human), Plasma cell myeloma, Cancer cell line (CVCL_6257)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12161532/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12161532/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12161532/full.md

---
Source: https://tomesphere.com/paper/PMC12161532