# Study on salient object segmentation based on depth information guidance and SAM low-rank adaptation fine-tuning

**Authors:** Weiping M.A.

PMC · DOI: 10.1371/journal.pone.0340765 · PLOS One · 2026-01-23

## TL;DR

This paper introduces a new method for salient object segmentation that combines RGB-D data and the Segment Anything Model to improve accuracy and efficiency in complex scenes.

## Contribution

The novel approach integrates depth information, SAM, and cross-modal attention with lightweight fine-tuning to enhance segmentation performance.

## Key findings

- The method achieves significant improvements over existing techniques on benchmark datasets using metrics like MaxF, MAE, and S-measure.
- It performs exceptionally well in challenging scenarios with complex backgrounds, small targets, and multiple salient objects.
- The use of cross-modal attention and lightweight LoRA fine-tuning ensures computational efficiency without sacrificing precision.

## Abstract

Accurate segmentation of salient objects is crucial for various computer vision applications including image editing, autonomous driving, and object detection. While research on using depth information (RGB-D) in saliency detection is gaining significant attention, its broad application is limited by dependencies on depth sensors and the challenge of effectively integrating RGB and depth information. To address these issues, we propose an innovative method for salient object segmentation that integrates the Segment Anything Model (SAM), depth information, and cross-modal attention mechanisms. Our approach leverages SAM for robust feature extraction and combines it with a pre-trained depth estimation network to capture geometric information. By dynamically fusing features from RGB and depth modalities through a cross-modal attention mechanism, our method enhances the ability to handle diverse scenes. Additionally, we achieve computational efficiency without compromising precision by employing lightweight LoRA fine-tuning and freezing pre-trained weights. The use of a UNet decoder refines the segmentation output, ensuring the preservation of target boundary details in high-resolution outputs. Experiments conducted on five challenging benchmark datasets validate the effectiveness of our proposed method. Results show significant improvements over existing methods across key evaluation metrics, including MaxF, MAE, and S-measure. Particularly in tasks involving complex backgrounds, small targets, and multiple salient object segmentation, our method demonstrates superior performance and robustness. The significance of this work lies in advancing the application of depth-guided RGB in salient object segmentation while offering new insights into overcoming depth sensor dependency. Furthermore, it opens up novel pathways for the effective fusion of cross-modal information, thereby contributing to the broader development and diversification of related technologies and their applications.

## Full-text entities

- **Diseases:** LoRA (MESH:D018489), CPD (MESH:C565865)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12829867/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12829867/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12829867/full.md

---
Source: https://tomesphere.com/paper/PMC12829867