How Good is a Video Summary? A New Benchmarking Dataset and Evaluation Framework Towards Realistic Video Summarization
Vishal Kaushal, Suraj Kothawade, Anshul Tomar, Rishabh Iyer, Ganesh, Ramakrishnan

TL;DR
This paper introduces VISIOCITY, a new long-video dataset with diverse annotations, and proposes an evaluation framework that better aligns with human judgment, advancing the assessment of video summarization methods.
Contribution
The paper presents a novel long-video dataset with dense annotations, strategies for automatic multiple reference summaries, and a new evaluation framework for more human-like assessment.
Findings
Multiple diverse ground truth summaries improve learning.
The proposed evaluation correlates better with human judgment.
Learning from multiple references outperforms single-reference methods.
Abstract
Automatic video summarization is still an unsolved problem due to several challenges. The currently available datasets either have very short videos or have few long videos of only a particular type. We introduce a new benchmarking video dataset called VISIOCITY (VIdeo SummarIzatiOn based on Continuity, Intent and DiversiTY) which comprises of longer videos across six different categories with dense concept annotations capable of supporting different flavors of video summarization and other vision problems. For long videos, human reference summaries necessary for supervised video summarization techniques are difficult to obtain. We explore strategies to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY. We show that these summaries are at par with human summaries. We also present a study of different desired characteristics of a good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Advanced Image and Video Retrieval Techniques
