PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning
Qiran Zhang, Yuheng Wang, Runde Yang, Lin Wu, Jingru Fan, Shu Yao, Jie Zhang, Tianle Zhou, Huatao Li, Ruijie Shi, Yihan Li, Chen Qian

TL;DR
PRISM is a comprehensive benchmark with over 10,000 instruction-code pairs designed to evaluate spatial-temporal reasoning in programmatic video generation, highlighting gaps in current language models.
Contribution
Introduces PRISM, a large-scale, real-world benchmark with a novel evaluation framework for assessing spatial and temporal accuracy in programmatic video generation.
Findings
Seven mainstream LLMs show a 41% drop from execution success to spatial correctness.
Current models often produce executable code that lacks spatial coherence.
Evaluation should extend beyond code executability to include spatial-temporal accuracy.
Abstract
Programmatic video generation through code offers geometric precision and temporal coherence beyond pixel-level diffusion models, yet rigorously evaluating whether language models can produce spatially correct animated outputs remains an open problem. We introduce PRISM, a large-scale benchmark of 10,372 human-calibrated instruction-code pairs (20 times larger than prior programmatic video generation benchmarks), grounded in real-world knowledge visualization scenarios across English and Chinese and spanning 437 subject categories. We further propose a funnel-style evaluation framework with four complementary metrics: Code-Level Reliability for executability, Spatial Reasoning for layout correctness over full animation sequences, and Prompt-Aware Dynamic Visual Complexity (PADVC) and Temporal Density (TD) for diagnosing dynamic expression and temporal activity. Systematic evaluation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
