Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach
Aidean Sharghi, Jacob S. Laurel, Boqing Gong

TL;DR
This paper introduces a query-focused video summarization approach using a memory network, along with a new dataset and evaluation method emphasizing semantic content over visual overlap, addressing user subjectivity and evaluation challenges.
Contribution
It proposes a novel memory network-based model for personalized video summarization and introduces a new dataset and evaluation metric based on semantic concepts.
Findings
The proposed model outperforms existing summarizers in semantic relevance.
The new dataset enables more accurate evaluation of semantic content.
The evaluation method aligns better with human perception of summaries.
Abstract
Recent years have witnessed a resurgence of interest in video summarization. However, one of the main obstacles to the research on video summarization is the user subjectivity - users have various preferences over the summaries. The subjectiveness causes at least two problems. First, no single video summarizer fits all users unless it interacts with and adapts to the individual users. Second, it is very challenging to evaluate the performance of a video summarizer. To tackle the first problem, we explore the recently proposed query-focused video summarization which introduces user preferences in the form of text queries about the video into the summarization process. We propose a memory network parameterized sequential determinantal point process in order to attend the user query onto different video frames and shots. To address the second challenge, we contend that a good evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Human Motion and Animation
