Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM
Lai Jiang, Mai Xu, Zulin Wang

TL;DR
This paper introduces a novel deep neural network approach for predicting video saliency by leveraging an object-to-motion CNN and a two-layer convolutional LSTM, trained on a new large-scale eye-tracking database, achieving state-of-the-art results.
Contribution
The paper presents a new large-scale eye-tracking video database and a combined CNN-LSTM model that captures object motion and temporal attention transitions for improved video saliency prediction.
Findings
The proposed method outperforms existing models in saliency prediction accuracy.
Human attention is primarily attracted by moving objects and their parts.
Saliency transitions smoothly across video frames, as captured by the model.
Abstract
Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Visual perception and processing mechanisms
