Evaluation of Text-to-Video Generation Models: A Dynamics Perspective
Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong, Zhao, Wangmeng Zuo, Qixiang Ye, Jingdong Wang

TL;DR
This paper introduces DEVIL, a new evaluation protocol focusing on the dynamics of videos to better assess text-to-video models, correlating highly with human judgments.
Contribution
The study proposes a dynamics-centered evaluation protocol, benchmark, and metrics for more comprehensive assessment of T2V models.
Findings
DEVIL achieves over 90% Pearson correlation with human ratings.
The benchmark reflects multiple dynamics grades in text prompts.
Metrics effectively evaluate dynamics range, controllability, and quality.
Abstract
Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to text prompts. In this study, we propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V models. For this purpose, we establish a new benchmark comprising text prompts that fully reflect multiple dynamics grades, and define a set of dynamics scores corresponding to various temporal granularities to comprehensively evaluate the dynamics of each generated video. Based on the new benchmark and the dynamics scores, we assess T2V models with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Multimedia Communication and Technology · Data Visualization and Analytics
MethodsSparse Evolutionary Training · Focus
