Revisiting Video Saliency: A Large-scale Benchmark and a New Model

Wenguan Wang; Jianbing Shen; Fang Guo; Ming-Ming Cheng and; Ali Borji

arXiv:1801.07424·cs.CV·May 29, 2018·42 cites

Revisiting Video Saliency: A Large-scale Benchmark and a New Model

Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng and, Ali Borji

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive new benchmark dataset for video saliency prediction and proposes a novel CNN-LSTM model with an attention mechanism that improves performance and training efficiency.

Contribution

The work provides a large-scale, diverse video saliency dataset and a new attention-augmented model that enhances temporal saliency learning and leverages static fixation data.

Findings

01

Our model outperforms state-of-the-art methods on three large-scale datasets.

02

The DHF1K dataset significantly advances diversity and difficulty in video saliency benchmarks.

03

Attention mechanism improves the focus and efficiency of saliency learning.

Abstract

In this work, we contribute to video saliency research in two ways. First, we introduce a new benchmark for predicting human eye movements during dynamic scene free-viewing, which is long-time urged in this field. Our dataset, named DHF1K (Dynamic Human Fixation), consists of 1K high-quality, elaborately selected video sequences spanning a large range of scenes, motions, object types and background complexity. Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments. In contrast, DHF1K makes a significant leap in terms of scalability, diversity and difficulty, and is expected to boost video saliency modeling. Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenguanwang/DHF1K
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Virtual Reality Applications and Impacts

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory