CNN-Based Prediction of Frame-Level Shot Importance for Video Summarization
Mohaiminul Al Nahian, A. S. M. Iftekhar, Mohammad Tariqul Islam, S. M., Mahbubur Rahman, Dimitrios Hatzinakos

TL;DR
This paper introduces a CNN-based method for predicting frame-level shot importance in videos, aiming to improve video summarization by capturing scene relevance more effectively than traditional feature-based approaches.
Contribution
The paper proposes a novel CNN architecture trained to estimate shot importance, outperforming traditional methods in accuracy and efficiency for user-oriented video summarization.
Findings
CNN method achieves lower mean absolute error
Outperforms feature-based methods in accuracy
Provides near-instant importance estimation
Abstract
In the Internet, ubiquitous presence of redundant, unedited, raw videos has made video summarization an important problem. Traditional methods of video summarization employ a heuristic set of hand-crafted features, which in many cases fail to capture subtle abstraction of a scene. This paper presents a deep learning method that maps the context of a video to the importance of a scene similar to that is perceived by humans. In particular, a convolutional neural network (CNN)-based architecture is proposed to mimic the frame-level shot importance for user-oriented video summarization. The weights and biases of the CNN are trained extensively through off-line processing, so that it can provide the importance of a frame of an unseen video almost instantaneously. Experiments on estimating the shot importance is carried out using the publicly available database TVSum50. It is shown that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Digital Media Forensic Detection
