A General Framework for Edited Video and Raw Video Summarization
Xuelong Li, Bin Zhao, Xiaoqiang Lu

TL;DR
This paper presents a versatile framework for summarizing both edited and raw videos by modeling importance, representativeness, diversity, and story flow, with learned weights and a combined scoring function.
Contribution
It introduces a unified, supervised learning-based framework that captures multiple properties of video summaries applicable to various video types.
Findings
Effective on multiple datasets including edited and raw videos
Supervised learning of property-weights improves summary quality
Framework outperforms existing methods in experiments
Abstract
In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds: 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity) and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Algorithms and Data Compression
