Video Summarization using Deep Semantic Features
Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkil\"a, Naokazu, Yokoya

TL;DR
This paper introduces a novel video summarization method that leverages deep semantic features to understand diverse Internet videos, improving summary quality through a clustering approach.
Contribution
It proposes a deep neural network that encodes semantic content from videos and descriptions into a common space, enhancing summarization effectiveness.
Findings
Deep semantic features improve summarization accuracy.
Clustering-based method effectively selects key segments.
Evaluation shows advantages over baseline approaches.
Abstract
This paper presents a video summarization technique for an Internet video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original video requires to understand its content. Furthermore the content of Internet videos is very diverse, ranging from home videos to documentaries, which makes video summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard video summarization techniques. For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions. To generate a video summary, we extract…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Multimedia Communication and Technology
