EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Yaofang Liu; Xiaodong Cun; Xuebo Liu; Xintao Wang; Yong Zhang; Haoxin; Chen; Yang Liu; Tieyong Zeng; Raymond Chan; Ying Shan

arXiv:2310.11440·cs.CV·March 26, 2024·5 cites

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin, Chen, Yang Liu, Tieyong Zeng, Raymond Chan, Ying Shan

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces EvalCrafter, a comprehensive benchmarking framework for large video generation models that uses a diverse prompt set and multiple metrics, aligning evaluation scores with human preferences.

Contribution

It presents a novel evaluation pipeline with 700 prompts and a human-aligned scoring method, improving assessment accuracy of video generation models.

Findings

01

The proposed benchmark correlates better with human judgments than traditional metrics.

02

EvalCrafter enables detailed analysis of visual, content, motion, and alignment qualities.

03

State-of-the-art models are evaluated comprehensively using the new framework.

Abstract

The vision and language generative models have been overgrown in recent years. For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos. However, these methods often use a few metrics, e.g., FVD or IS, to evaluate the performance. We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities. Thus, we propose a novel framework and pipeline for exhaustively evaluating the performance of the generated videos. Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation, which is based on an analysis of real-world user data and generated with the assistance of a large language model. Then, we evaluate the state-of-the-art video generative models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EvalCrafter/EvalCrafter
pytorchOfficial

Datasets

RaphaelLiu/EvalCrafter_T2V_Dataset
dataset· 75 dl
75 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis

MethodsALIGN