TL;DR
This paper introduces MECD+, a novel framework and dataset for uncovering detailed event-level causal graphs in long videos, enhancing understanding of complex causal relationships beyond brief segments.
Contribution
It proposes a new task, dataset, and a Granger causality-inspired framework with advanced inference techniques for comprehensive causal discovery in videos.
Findings
Outperforms GPT-4o and VideoChat2 in causal reasoning accuracy.
Enables improved downstream video understanding tasks.
Effectively models complex, interconnected causal relations in videos.
Abstract
Video causal reasoning aims to achieve a high-level understanding of videos from a causal perspective. However, it exhibits limitations in its scope, primarily executed in a question-answering paradigm and focusing on brief video segments containing isolated events and basic causal relations, lacking comprehensive and structured causality analysis for videos with multiple interconnected events. To fill this gap, we introduce a new task and dataset, Multi-Event Causal Discovery (MECD). It aims to uncover the causal relations between events distributed chronologically across long videos. Given visual segments and textual descriptions of events, MECD identifies the causal associations between these events to derive a comprehensive and structured event-level video causal graph explaining why and how the result event occurred. To address the challenges of MECD, we devise a novel framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCausal inference
