Does SpatioTemporal information benefit Two video summarization benchmarks?
Aashutosh Ganesh, Mirela Popa, Daan Odijk, Nava Tintarev

TL;DR
This paper investigates whether spatio-temporal information is essential for effective video summarization by analyzing the impact of temporal disruptions on benchmark datasets, finding that static cues may suffice.
Contribution
The study critically assesses the role of temporal information in video summarization benchmarks, revealing that models relying on static cues perform comparably to those using temporal data.
Findings
Temporally invariant models achieve competitive scores on TVSum.
Existing models are not significantly affected by temporal disruptions.
Disrupting temporal order can sometimes improve model performance.
Abstract
An important aspect of summarizing videos is understanding the temporal context behind each part of the video to grasp what is and is not important. Video summarization models have in recent years modeled spatio-temporal relationships to represent this information. These models achieved state-of-the-art correlation scores on important benchmark datasets. However, what has not been reviewed is whether spatio-temporal relationships are even required to achieve state-of-the-art results. Previous work in activity recognition has found biases, by prioritizing static cues such as scenes or objects, over motion information. In this paper we inquire if similar spurious relationships might influence the task of video summarization. To do so, we analyse the role that temporal information plays on existing benchmark datasets. We first estimate a baseline with temporally invariant models to see how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Multimedia Communication and Technology
