LoCoT2V-Bench: Benchmarking Long-Form and Complex Text-to-Video Generation

Xiangqing Zheng; Chengyue Wu; Kehai Chen; Min Zhang

arXiv:2510.26412·cs.CV·February 2, 2026

LoCoT2V-Bench: Benchmarking Long-Form and Complex Text-to-Video Generation

Xiangqing Zheng, Chengyue Wu, Kehai Chen, Min Zhang

PDF

TL;DR

LoCoT2V-Bench introduces a comprehensive benchmark and evaluation framework for assessing long-form, complex text-to-video generation, highlighting current strengths and key challenges in the field.

Contribution

The paper presents LoCoT2V-Bench and LoCoT2V-Eval, new tools for benchmarking and evaluating long video generation with complex prompts, emphasizing multi-dimensional assessment.

Findings

01

Models excel in perceptual quality and background consistency.

02

Fine-grained text-video alignment is weak across models.

03

Character consistency remains a significant challenge.

Abstract

Recent advances in text-to-video generation have achieved impressive performance on short clips, yet evaluating long-form generation under complex textual inputs remains a significant challenge. In response to this challenge, we present LoCoT2V-Bench, a benchmark for long video generation (LVG) featuring multi-scene prompts with hierarchical metadata (e.g., character settings and camera behaviors), constructed from collected real-world videos. We further propose LoCoT2V-Eval, a multi-dimensional framework covering perceptual quality, text-video alignment, temporal quality, dynamic quality, and Human Expectation Realization Degree (HERD), with an emphasis on aspects such as fine-grained text-video alignment and temporal character consistency. Experiments on 13 representative LVG models reveal pronounced capability disparities across evaluation dimensions, with strong perceptual quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.