# Graph-Based Uncertainty Modeling and Multimodal Fusion for Salient Object Detection

**Authors:** Yuqi Xiong, Wuzhen Shi, Yang Wen, Ruhan Liu

arXiv: 2508.20415 · 2025-08-29

## TL;DR

This paper introduces DUP-MCRNet, a novel network that enhances salient object detection by modeling uncertainty and fusing multimodal data, leading to clearer edges and better performance in complex scenes.

## Contribution

The paper proposes a dynamic uncertainty graph convolution and a multimodal fusion strategy with learnable weights, improving detection accuracy and robustness over existing methods.

## Key findings

- Outperforms existing SOD methods on benchmark datasets.
- Improves edge clarity and robustness in complex scenes.
- Effectively fuses multimodal information to enhance salient object detection.

## Abstract

In view of the problems that existing salient object detection (SOD) methods are prone to losing details, blurring edges, and insufficient fusion of single-modal information in complex scenes, this paper proposes a dynamic uncertainty propagation and multimodal collaborative reasoning network (DUP-MCRNet). Firstly, a dynamic uncertainty graph convolution module (DUGC) is designed to propagate uncertainty between layers through a sparse graph constructed based on spatial semantic distance, and combined with channel adaptive interaction, it effectively improves the detection accuracy of small structures and edge regions. Secondly, a multimodal collaborative fusion strategy (MCF) is proposed, which uses learnable modality gating weights to weightedly fuse the attention maps of RGB, depth, and edge features. It can dynamically adjust the importance of each modality according to different scenes, effectively suppress redundant or interfering information, and strengthen the semantic complementarity and consistency between cross-modalities, thereby improving the ability to identify salient regions under occlusion, weak texture or background interference. Finally, the detection performance at the pixel level and region level is optimized through multi-scale BCE and IoU loss, cross-scale consistency constraints, and uncertainty-guided supervision mechanisms. Extensive experiments show that DUP-MCRNet outperforms various SOD methods on most common benchmark datasets, especially in terms of edge clarity and robustness to complex backgrounds. Our code is publicly available at https://github.com/YukiBear426/DUP-MCRNet.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20415/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20415/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/2508.20415/full.md

---
Source: https://tomesphere.com/paper/2508.20415