Video Saliency Detection by 3D Convolutional Neural Networks
Guanqun Ding, and Yuming Fang

TL;DR
This paper introduces a novel 3D convolutional neural network approach for video saliency detection, effectively capturing spatial and temporal features to improve saliency map prediction.
Contribution
The paper proposes a new Conv3DNet and Deconv3DNet architecture specifically designed for video saliency detection, integrating spatiotemporal features more effectively than prior methods.
Findings
Outperforms existing video saliency detection methods
Effectively captures spatiotemporal features using 3D CNNs
Provides more accurate saliency maps for video sequences
Abstract
Different from salient object detection methods for still images, a key challenging for video saliency detection is how to extract and combine spatial and temporal features. In this paper, we present a novel and effective approach for salient object detection for video sequences based on 3D convolutional neural networks. First, we design a 3D convolutional network (Conv3DNet) with the input as three video frame to learn the spatiotemporal features for video sequences. Then, we design a 3D deconvolutional network (Deconv3DNet) to combine the spatiotemporal features to predict the final saliency map for video sequences. Experimental results show that the proposed saliency detection model performs better in video saliency prediction compared with the state-of-the-art video saliency detection methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Advanced Image and Video Retrieval Techniques
