Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

Mingxiang Liao; Hannan Lu; Xinyu Zhang; Fang Wan; Tianyu Wang; Yuzhong; Zhao; Wangmeng Zuo; Qixiang Ye; Jingdong Wang

arXiv:2407.01094·cs.CV·July 2, 2024·3 cites

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong, Zhao, Wangmeng Zuo, Qixiang Ye, Jingdong Wang

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces DEVIL, a new evaluation protocol focusing on the dynamics of videos to better assess text-to-video models, correlating highly with human judgments.

Contribution

The study proposes a dynamics-centered evaluation protocol, benchmark, and metrics for more comprehensive assessment of T2V models.

Findings

01

DEVIL achieves over 90% Pearson correlation with human ratings.

02

The benchmark reflects multiple dynamics grades in text prompts.

03

Metrics effectively evaluate dynamics range, controllability, and quality.

Abstract

Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to text prompts. In this study, we propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V models. For this purpose, we establish a new benchmark comprising text prompts that fully reflect multiple dynamics grades, and define a set of dynamics scores corresponding to various temporal granularities to comprehensively evaluate the dynamics of each generated video. Based on the new benchmark and the dynamics scores, we assess T2V models with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mingxiangl/devil
pytorchOfficial

Models

🤗
Video-Bench/Video-Bench
model· 1 dl· ♡ 1
1 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Multimedia Communication and Technology · Data Visualization and Analytics

MethodsSparse Evolutionary Training · Focus