Depth-Cooperated Trimodal Network for Video Salient Object Detection

Yukang Lu; Dingyao Min; Keren Fu; Qijun Zhao

arXiv:2202.06060·cs.CV·July 12, 2022

Depth-Cooperated Trimodal Network for Video Salient Object Detection

Yukang Lu, Dingyao Min, Keren Fu, Qijun Zhao

PDF

Open Access 2 Repos

TL;DR

This paper introduces DCTNet, a novel depth-cooperated trimodal network for video salient object detection that integrates depth, RGB, and optical flow information using specialized modules to improve detection accuracy.

Contribution

The paper pioneers the incorporation of depth information into video salient object detection through a multi-modal attention module and a refinement fusion module, enhancing detection performance.

Findings

01

Outperforms 12 state-of-the-art methods on five benchmark datasets.

02

Demonstrates the effectiveness of depth information in VSOD.

03

Validates the necessity of depth for improved detection accuracy.

Abstract

Depth can provide useful geographical cues for salient object detection (SOD), and has been proven helpful in recent RGB-D SOD methods. However, existing video salient object detection (VSOD) methods only utilize spatiotemporal information and seldom exploit depth information for detection. In this paper, we propose a depth-cooperated trimodal network, called DCTNet for VSOD, which is a pioneering work to incorporate depth information to assist VSOD. To this end, we first generate depth from RGB frames, and then propose an approach to treat the three modalities unequally. Specifically, a multi-modal attention module (MAM) is designed to model multi-modal long-range dependencies between the main modality (RGB) and the two auxiliary modalities (depth, optical flow). We also introduce a refinement fusion module (RFM) to suppress noises in each modality and select useful information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Virtual Reality Applications and Impacts · Face Recognition and Perception