T2VEval: Benchmark Dataset and Objective Evaluation Method for T2V-generated Videos
Zelu Qi, Ping Shi, Shuqi Wang, Chaoyang Zhang, Fei Zhao, Zefeng Ying, Da Pan, Xi Yang, Zheqi He, Teng Dai

TL;DR
This paper introduces T2VEval, a comprehensive benchmark dataset and an advanced evaluation method for assessing the quality of text-to-video generated content, addressing complex distortions and providing a multi-dimensional assessment framework.
Contribution
It presents T2VEval-Bench, a new dataset with diverse videos and prompts, and develops T2VEval, a novel multi-branch fusion model for accurate T2V video quality evaluation.
Findings
T2VEval outperforms existing metrics in multiple evaluation benchmarks.
The dataset includes 148 prompts and 1,783 videos from 13 T2V models.
The method effectively captures multiple quality dimensions through a fusion approach.
Abstract
Recent advances in text-to-video (T2V) technology, as demonstrated by models such as Runway Gen-3, Pika, Sora, and Kling, have significantly broadened the applicability and popularity of the technology. This progress has created a growing demand for accurate quality assessment metrics to evaluate the perceptual quality of T2V-generated videos and optimize video generation models. However, assessing the quality of text-to-video outputs remain challenging due to the presence of highly complex distortions, such as unnatural actions and phenomena that defy human cognition. To address these challenges, we constructed T2VEval-Bench, a multi-dimensional benchmark dataset for text-to-video quality evaluation, which contains 148 textual prompts and 1,783 videos generated by 13 T2V models. To ensure a comprehensive evaluation, we scored each video on four dimensions in the subjective experiment,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Educational Techniques · Multimedia Communication and Technology · Video Analysis and Summarization
