Predicting Video Saliency with Object-to-Motion CNN and Two-layer   Convolutional LSTM

Lai Jiang; Mai Xu; Zulin Wang

arXiv:1709.06316·cs.CV·January 16, 2019·73 cites

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Lai Jiang, Mai Xu, Zulin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel deep neural network approach for predicting video saliency by leveraging an object-to-motion CNN and a two-layer convolutional LSTM, trained on a new large-scale eye-tracking database, achieving state-of-the-art results.

Contribution

The paper presents a new large-scale eye-tracking video database and a combined CNN-LSTM model that captures object motion and temporal attention transitions for improved video saliency prediction.

Findings

01

The proposed method outperforms existing models in saliency prediction accuracy.

02

Human attention is primarily attracted by moving objects and their parts.

03

Saliency transitions smoothly across video frames, as captured by the model.

Abstract

Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

remega/LEDOV-eye-tracking-database
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Visual perception and processing mechanisms