SCP: Spatial Causal Prediction in Video

Yanguang Zhao; Jie Yang; Shengqiong Wu; Shutong Hu; Hongbo Qiu; Yu Wang; Guijia Zhang; Tan Kai Ze; Hao Fei; Chia-Wen Lin; Mong-Li Lee; Wynne Hsu

arXiv:2603.03944·cs.CV·April 7, 2026

SCP: Spatial Causal Prediction in Video

Yanguang Zhao, Jie Yang, Shengqiong Wu, Shutong Hu, Hongbo Qiu, Yu Wang, Guijia Zhang, Tan Kai Ze, Hao Fei, Chia-Wen Lin, Mong-Li Lee, Wynne Hsu

PDF

1 Repo

TL;DR

This paper introduces Spatial Causal Prediction (SCP), a new task and benchmark for evaluating models' ability to infer unseen spatial causal states in videos, highlighting current limitations and proposing strategies for improvement.

Contribution

The paper defines SCP as a novel task, creates SCP-Bench benchmark with 2,500 QA pairs, and evaluates 23 models to identify performance gaps and guide future research.

Findings

01

Models show significant performance gaps compared to humans.

02

Limited ability of models to extrapolate temporally and infer causality.

03

Proposed perception and reasoning strategies improve spatial causal understanding.

Abstract

Spatial reasoning, the ability to understand spatial relations, causality, and dynamic evolution, is central to human intelligence and essential for real-world applications such as autonomous driving and robotics. Existing studies, however, primarily assess models on visible spatio-temporal understanding, overlooking their ability to infer unseen past or future spatial states. In this work, we introduce Spatial Causal Prediction (SCP), a new task paradigm that challenges models to reason beyond observation and predict spatial causal outcomes. We further construct SCP-Bench, a benchmark comprising 2,500 QA pairs across 1,181 videos spanning diverse viewpoints, scenes, and causal directions, to support systematic evaluation. Through comprehensive experiments on {23} state-of-the-art models, we reveal substantial gaps between human and model performance, limited temporal extrapolation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://guangstrip.github.io/SCP-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.