TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Harold Haodong Chen; Disen Lan; Wen-Jie Shu; Qingyang Liu; Zihan Wang; Sirui Chen; Wenkai Cheng; Kanghao Chen; Hongfei Zhang; Zixin Zhang; Rongjin Guo; Yu Cheng; Ying-Cong Chen

arXiv:2511.13704·cs.CV·December 23, 2025

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Harold Haodong Chen, Disen Lan, Wen-Jie Shu, Qingyang Liu, Zihan Wang, Sirui Chen, Wenkai Cheng, Kanghao Chen, Hongfei Zhang, Zixin Zhang, Rongjin Guo, Yu Cheng, Ying-Cong Chen

PDF

Open Access

TL;DR

TiViBench is a hierarchical benchmark designed to evaluate reasoning abilities in video generative models across multiple dimensions, revealing strengths and limitations of current models and proposing a test-time strategy to improve reasoning performance.

Contribution

The paper introduces TiViBench, a comprehensive benchmark for reasoning in video models, and VideoTPO, a novel test-time strategy to enhance reasoning without extra training.

Findings

01

Commercial models show stronger reasoning capabilities.

02

Open-source models have untapped potential limited by data.

03

VideoTPO improves reasoning performance significantly.

Abstract

The rapid evolution of video generative models has shifted their focus from producing visually plausible outputs to tackling tasks requiring physical plausibility and logical consistency. However, despite recent breakthroughs such as Veo 3's chain-of-frames reasoning, it remains unclear whether these models can exhibit reasoning capabilities similar to large language models (LLMs). Existing benchmarks predominantly evaluate visual fidelity and temporal coherence, failing to capture higher-order reasoning abilities. To bridge this gap, we propose TiViBench, a hierarchical benchmark specifically designed to evaluate the reasoning capabilities of image-to-video (I2V) generation models. TiViBench systematically assesses reasoning across four dimensions: i) Structural Reasoning & Search, ii) Spatial & Visual Pattern Reasoning, iii) Symbolic & Logical Reasoning, and iv) Action Planning & Task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics