Guidance and Teaching Network for Video Salient Object Detection

Yingxia Jiao; Xiao Wang; Yu-Cheng Chou; Shouyuan Yang; Ge-Peng Ji,; Rong Zhu; Ge Gao

arXiv:2105.10110·cs.CV·June 8, 2021·1 cites

Guidance and Teaching Network for Video Salient Object Detection

Yingxia Jiao, Xiao Wang, Yu-Cheng Chou, Shouyuan Yang, Ge-Peng Ji,, Rong Zhu, Ge Gao

PDF

Open Access

TL;DR

The paper introduces GTNet, a novel architecture for video salient object detection that effectively captures spatial-temporal cues through implicit and explicit guidance, improving accuracy in complex scenarios.

Contribution

It proposes a guidance and teaching network that decouples spatial-temporal cues and fuses cross-modal features for enhanced video saliency detection.

Findings

01

Achieves competitive results on three benchmarks.

02

Runs at approximately 28 fps on a single GPU.

03

Outperforms 14 state-of-the-art methods.

Abstract

Owing to the difficulties of mining spatial-temporal cues, the existing approaches for video salient object detection (VSOD) are limited in understanding complex and noisy scenarios, and often fail in inferring prominent objects. To alleviate such shortcomings, we propose a simple yet efficient architecture, termed Guidance and Teaching Network (GTNet), to independently distil effective spatial and temporal cues with implicit guidance and explicit teaching at feature- and decision-level, respectively. To be specific, we (a) introduce a temporal modulator to implicitly bridge features from motion into the appearance branch, which is capable of fusing cross-modal features collaboratively, and (b) utilise motion-guided mask to propagate the explicit cues during the feature aggregation. This novel learning strategy achieves satisfactory results via decoupling the complex spatial-temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods