ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient   Object Detection

Junhao Lin; Lei Zhu; Jiaxing Shen; Huazhu Fu; Qing Zhang; Liansheng; Wang

arXiv:2406.12536·cs.CV·September 20, 2024

ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

Junhao Lin, Lei Zhu, Jiaxing Shen, Huazhu Fu, Qing Zhang, Liansheng, Wang

PDF

1 Repo

TL;DR

This paper introduces ViDSOD-100, a new RGB-D video dataset with high-quality annotations, and proposes ATF-Net, a novel model that effectively integrates appearance, motion, and depth information for improved salient object detection in videos.

Contribution

The paper presents a new annotated RGB-D video dataset and a baseline model that fuses multiple modalities for enhanced video saliency detection.

Findings

01

ATF-Net outperforms existing methods on ViDSOD-100 and DAVSOD datasets.

02

The multi-modality fusion approach improves detection accuracy.

03

Experimental results demonstrate significant performance gains over state-of-the-art techniques.

Abstract

With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D video SOD (ViDSOD-100) dataset, which contains 100 videos within a total of 9,362 frames, acquired from diverse natural scenes. All the frames in each video are manually annotated to a high-quality saliency annotation. Moreover, we propose a new baseline model, named attentive triple-fusion network (ATF-Net), for RGB-D video salient object detection. Our method aggregates the appearance information from an input RGB image, spatio-temporal information from an estimated motion map, and the geometry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jhl-det/rgbd_video_sod
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus