EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin, Chen, Yang Liu, Tieyong Zeng, Raymond Chan, Ying Shan

TL;DR
This paper introduces EvalCrafter, a comprehensive benchmarking framework for large video generation models that uses a diverse prompt set and multiple metrics, aligning evaluation scores with human preferences.
Contribution
It presents a novel evaluation pipeline with 700 prompts and a human-aligned scoring method, improving assessment accuracy of video generation models.
Findings
The proposed benchmark correlates better with human judgments than traditional metrics.
EvalCrafter enables detailed analysis of visual, content, motion, and alignment qualities.
State-of-the-art models are evaluated comprehensively using the new framework.
Abstract
The vision and language generative models have been overgrown in recent years. For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos. However, these methods often use a few metrics, e.g., FVD or IS, to evaluate the performance. We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities. Thus, we propose a novel framework and pipeline for exhaustively evaluating the performance of the generated videos. Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation, which is based on an analysis of real-world user data and generated with the assistance of a large language model. Then, we evaluate the state-of-the-art video generative models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
MethodsALIGN
