V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

Yang Luo; Xuanlei Zhao; Baijiong Lin; Lingting Zhu; Liyao Tang; Yuqi Liu; Ying-Cong Chen; Shengju Qian; Xin Wang; Yang You

arXiv:2511.16668·cs.CV·November 21, 2025

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

Yang Luo, Xuanlei Zhao, Baijiong Lin, Lingting Zhu, Liyao Tang, Yuqi Liu, Ying-Cong Chen, Shengju Qian, Xin Wang, Yang You

PDF

Open Access

TL;DR

V-ReasonBench is a comprehensive benchmark suite designed to evaluate various aspects of reasoning in video generation models, addressing the need for systematic and reliable assessment of their reasoning capabilities.

Contribution

The paper introduces V-ReasonBench, a new unified benchmark for assessing video reasoning across multiple dimensions using synthetic and real-world data, enabling better evaluation of video models.

Findings

01

Significant variation in reasoning abilities across models and dimensions.

02

Video models show different hallucination behaviors compared to image models.

03

Longer video durations impact Chain-of-Frames reasoning performance.

Abstract

Recent progress in generative video models, such as Veo-3, has shown surprising zero-shot reasoning abilities, creating a growing need for systematic and reliable evaluation. We introduce V-ReasonBench, a benchmark designed to assess video reasoning across four key dimensions: structured problem-solving, spatial cognition, pattern-based inference, and physical dynamics. The benchmark is built from both synthetic and real-world image sequences and provides a diverse set of answer-verifiable tasks that are reproducible, scalable, and unambiguous. Evaluations of six state-of-the-art video models reveal clear dimension-wise differences, with strong variation in structured, spatial, pattern-based, and physical reasoning. We further compare video models with strong image models, analyze common hallucination behaviors, and study how video duration affects Chain-of-Frames reasoning. Overall,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Artificial Intelligence in Games