CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos

Xuchen Li; Xuzhao Li; Shiyu Hu; Kaiqi Huang; Wentao Zhang

arXiv:2507.16878·cs.CV·July 24, 2025

CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos

Xuchen Li, Xuzhao Li, Shiyu Hu, Kaiqi Huang, Wentao Zhang

PDF

Open Access 1 Video

TL;DR

CausalStep is a new benchmark for evaluating explicit stepwise causal reasoning in videos, designed to challenge models with causally linked questions and diagnostic metrics, revealing gaps in current AI reasoning capabilities.

Contribution

We introduce CausalStep, a benchmark with causally linked video segments, stepwise QA protocol, distractors, and diagnostic metrics to rigorously assess causal reasoning in video understanding.

Findings

01

Current models lag behind human reasoning on CausalStep

02

CausalStep reveals limitations of existing video reasoning models

03

Benchmark enables detailed diagnosis of causal reasoning skills

Abstract

Recent advances in large language models (LLMs) have improved reasoning in text and image domains, yet achieving robust video reasoning remains a significant challenge. Existing video benchmarks mainly assess shallow understanding and reasoning and allow models to exploit global context, failing to rigorously evaluate true causal and stepwise reasoning. We present CausalStep, a benchmark designed for explicit stepwise causal reasoning in videos. CausalStep segments videos into causally linked units and enforces a strict stepwise question-answer (QA) protocol, requiring sequential answers and preventing shortcut solutions. Each question includes carefully constructed distractors based on error type taxonomy to ensure diagnostic value. The benchmark features 100 videos across six categories and 1,852 multiple-choice QA pairs. We introduce seven diagnostic metrics for comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Games · Adversarial Robustness in Machine Learning