Two Causally Related Needles in a Video Haystack

Miaoyu Li; Qin Chao; Boyang Li

arXiv:2505.19853·cs.CV·November 7, 2025

Two Causally Related Needles in a Video Haystack

Miaoyu Li, Qin Chao, Boyang Li

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Causal2Needles, a new benchmark for evaluating long-video understanding in Video-Language Models, focusing on extracting and relating two separate pieces of information and modeling cause-effect relationships.

Contribution

The paper presents a novel benchmark with diverse question types that challenge models to understand long videos and causal relationships, revealing limitations of current models.

Findings

01

Models perform poorly on causal 2-needle questions.

02

Performance decreases as the distance between needles increases.

03

Current VLMs struggle with causal and long-context understanding.

Abstract

Properly evaluating the ability of Video-Language Models (VLMs) to understand long videos remains a challenge. We propose a long-context video understanding benchmark, Causal2Needles, that assesses two crucial abilities insufficiently addressed by existing benchmarks: (1) extracting information from two separate locations (two needles) in a long video and understanding them jointly, and (2) modeling the world in terms of cause and effect in human behaviors. Causal2Needles evaluates these abilities using noncausal one-needle, causal one-needle, and causal two-needle questions. The most complex question type, causal two-needle questions, require extracting information from both the cause and effect events from a long video and the associated narration text. To prevent textual bias, we introduce two complementary question formats: locating the video clip containing the answer, and verbal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

causal2needles/Causal2Needles
dataset· 494 dl
494 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhilosophy and History of Science

MethodsContrastive Language-Image Pre-training