A Survey of Video Datasets for Grounded Event Understanding

Kate Sanders; Benjamin Van Durme

arXiv:2406.09646·cs.CV·June 17, 2024

A Survey of Video Datasets for Grounded Event Understanding

Kate Sanders, Benjamin Van Durme

PDF

Open Access 1 Repo

TL;DR

This survey reviews 105 video datasets focused on event understanding, highlighting the need for better task framing and dataset curation to advance multimodal AI's common-sense reasoning capabilities.

Contribution

It provides a comprehensive analysis of existing datasets and tasks for video event understanding, proposing guidelines for future dataset creation and task design.

Findings

01

Many datasets focus on specific event types, limiting generalization.

02

Current tasks lack standardization, hindering progress.

03

Temporal aspects and ambiguity are crucial for effective event understanding.

Abstract

While existing video benchmarks largely consider specialized downstream tasks like retrieval or question-answering (QA), contemporary multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding. A critical component of human temporal-visual perception is our ability to identify and cognitively model "things happening", or events. Historically, video benchmark tasks have implicitly tested for this ability (e.g., video captioning, in which models describe visual events with natural language), but they do not consider video event understanding as a task in itself. Recent work has begun to explore video analogues to textual event extraction but consists of competing task definitions and datasets limited to highly specific event types. Therefore, while there is a rich domain of event-centric video research spanning the past 10+ years, it is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katesanders9/grounded-events
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Seismology and Earthquake Studies · Topic Modeling