Revisiting Video Saliency: A Large-scale Benchmark and a New Model
Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng and, Ali Borji

TL;DR
This paper introduces a comprehensive new benchmark dataset for video saliency prediction and proposes a novel CNN-LSTM model with an attention mechanism that improves performance and training efficiency.
Contribution
The work provides a large-scale, diverse video saliency dataset and a new attention-augmented model that enhances temporal saliency learning and leverages static fixation data.
Findings
Our model outperforms state-of-the-art methods on three large-scale datasets.
The DHF1K dataset significantly advances diversity and difficulty in video saliency benchmarks.
Attention mechanism improves the focus and efficiency of saliency learning.
Abstract
In this work, we contribute to video saliency research in two ways. First, we introduce a new benchmark for predicting human eye movements during dynamic scene free-viewing, which is long-time urged in this field. Our dataset, named DHF1K (Dynamic Human Fixation), consists of 1K high-quality, elaborately selected video sequences spanning a large range of scenes, motions, object types and background complexity. Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments. In contrast, DHF1K makes a significant leap in terms of scalability, diversity and difficulty, and is expected to boost video saliency modeling. Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Virtual Reality Applications and Impacts
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
