Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Ziqiang Wang; Zhi Liu; Gongyang Li; Yang Wang; Tianhong Zhang; Lihua; Xu; Jijun Wang

arXiv:2108.10696·cs.CV·January 19, 2022

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Ziqiang Wang, Zhi Liu, Gongyang Li, Yang Wang, Tianhong Zhang, Lihua, Xu, Jijun Wang

PDF

1 Repo

TL;DR

This paper introduces STSANet, a novel video saliency prediction model that employs spatio-temporal self-attention modules to capture long-range relations across different time steps, outperforming existing methods.

Contribution

The paper proposes a new spatio-temporal self-attention network with multi-scale feature fusion for improved video saliency prediction.

Findings

01

Outperforms state-of-the-art models on DHF1K, Hollywood-2, UCF, and DIEM datasets.

02

Effectively captures long-range spatio-temporal relations.

03

Demonstrates the importance of multi-level feature integration.

Abstract

3D convolutional neural networks have achieved promising results for video tasks in computer vision, including video saliency prediction that is explored in this paper. However, 3D convolution encodes visual representation merely on fixed local spacetime according to its kernel size, while human attention is always attracted by relational visual features at different time. To overcome this limitation, we propose a novel Spatio-Temporal Self-Attention 3D Network (STSANet) for video saliency prediction, in which multiple Spatio-Temporal Self-Attention (STSA) modules are employed at different levels of 3D convolutional backbone to directly capture long-range relations between spatio-temporal features of different time steps. Besides, we propose an Attentional Multi-Scale Fusion (AMSF) module to integrate multi-level features with the perception of context in semantic and spatio-temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

come880412/STSANet
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution · 3D Convolution