From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All
Hongxiang Gu, Viswanathan Swaminathan

TL;DR
This paper introduces ReconstSum, an unsupervised LSTM autoencoder framework that efficiently generates various types of video summaries, such as thumbnails and storyboards, outperforming existing methods.
Contribution
The paper presents a novel unsupervised deep learning approach capable of producing diverse video summaries from a single model, adaptable to multiple presentation formats.
Findings
ReconstSum outperforms state-of-the-art methods in thumbnail generation.
ReconstSum effectively creates storyboards with higher quality.
The framework is versatile for different summary types.
Abstract
Video summaries come in many forms, from traditional single-image thumbnails, animated thumbnails, storyboards, to trailer-like video summaries. Content creators use the summaries to display the most attractive portion of their videos; the users use them to quickly evaluate if a video is worth watching. All forms of summaries are essential to video viewers, content creators, and advertisers. Often video content management systems have to generate multiple versions of summaries that vary in duration and presentational forms. We present a framework ReconstSum that utilizes LSTM-based autoencoder architecture to extract and select a sparse subset of video frames or keyshots that optimally represent the input video in an unsupervised manner. The encoder selects a subset from the input video while the decoder seeks to reconstruct the video from the selection. The goal is to minimize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
