Temporal-Spatial Feature Pyramid for Video Saliency Detection

Qinyao Chang; Shiping Zhu

arXiv:2105.04213·cs.CV·September 15, 2021·21 cites

Temporal-Spatial Feature Pyramid for Video Saliency Detection

Qinyao Chang, Shiping Zhu

PDF

Open Access

TL;DR

This paper introduces a 3D encoder-decoder architecture that effectively combines multi-scale, spatial, and temporal features for real-time video saliency detection, significantly outperforming existing methods.

Contribution

The paper proposes a novel 3D fully convolutional encoder-decoder model that integrates multi-level features with temporal information for improved video saliency detection.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks

02

Operates in real time with high accuracy

03

Effectively combines scale, space, and time features

Abstract

Multi-level features are important for saliency detection. Better combination and use of multi-level features with time information can greatly improve the accuracy of the video saliency model. In order to fully combine multi-level features and make it serve the video saliency model, we propose a 3D fully convolutional encoder-decoder architecture for video saliency detection, which combines scale, space and time information for video saliency modeling. The encoder extracts multi-scale temporal-spatial features from the input continuous video frames, and then constructs temporal-spatial feature pyramid through temporal-spatial convolution and top-down feature integration. The decoder performs hierarchical decoding of temporal-spatial features from different scales, and finally produces a saliency map from the integration of multiple video frames. Our model is simple yet effective, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Image and Video Quality Assessment

MethodsConvolution