Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework
Vishal Kaushal, Suraj Kothawade, Rishabh Iyer, Ganesh Ramakrishnan

TL;DR
This paper introduces VISIOCITY, a new long-video dataset with dense annotations, and proposes novel methods for automatic video summarization and evaluation that better reflect human judgment.
Contribution
It presents a new benchmark dataset, a method for generating multiple reference summaries, and an improved evaluation framework for more realistic video summarization assessment.
Findings
VISIOCITY dataset includes longer videos across six categories.
The proposed method generates summaries comparable to human references.
The new evaluation framework aligns better with human judgment.
Abstract
Automatic video summarization is still an unsolved problem due to several challenges. We take steps towards making automatic video summarization more realistic by addressing them. Firstly, the currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking dataset VISIOCITY which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and can be used for other vision problems. Secondly, for long videos, human reference summaries are difficult to obtain. We present a novel recipe based on pareto optimality to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. Thirdly, we demonstrate that in the presence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Advanced Image and Video Retrieval Techniques
