SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection

Jia Lin; Xiaofei Zhou; Jiyuan Liu; Runmin Cong; Guodao Zhang; Zhi Liu; Jiyong Zhang

arXiv:2511.09870·cs.CV·November 14, 2025

SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection

Jia Lin, Xiaofei Zhou, Jiyuan Liu, Runmin Cong, Guodao Zhang, Zhi Liu, Jiyong Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces SAM-DAQ, a novel RGB-D video salient object detection method that integrates depth cues and temporal memory into a foundation model, achieving superior performance over existing techniques.

Contribution

The paper proposes a unified framework with depth-guided adaptive queries and a temporal memory module, fine-tuning a frozen SAM encoder for RGB-D video saliency detection.

Findings

01

Outperforms state-of-the-art methods on three datasets

02

Effectively integrates depth and temporal cues

03

Maintains high accuracy with prompt-free fine-tuning

Abstract

Recently segment anything model (SAM) has attracted widespread concerns, and it is often treated as a vision foundation model for universal segmentation. Some researchers have attempted to directly apply the foundation model to the RGB-D video salient object detection (RGB-D VSOD) task, which often encounters three challenges, including the dependence on manual prompts, the high memory consumption of sequential adapters, and the computational burden of memory attention. To address the limitations, we propose a novel method, namely Segment Anything Model with Depth-guided Adaptive Queries (SAM-DAQ), which adapts SAM2 to pop-out salient objects from videos by seamlessly integrating depth and temporal cues within a unified framework. Firstly, we deploy a parallel adapter-based multi-modal image encoder (PAMIE), which incorporates several depth-guided parallel adapters (DPAs) in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection· underline

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Human Pose and Action Recognition