VBench++: Comprehensive and Versatile Benchmark Suite for Video   Generative Models

Ziqi Huang; Fan Zhang; Xiaojie Xu; Yinan He; Jiashuo Yu; Ziyue Dong,; Qianli Ma; Nattapol Chanpaisit; Chenyang Si; Yuming Jiang; Yaohui Wang,; Xinyuan Chen; Ying-Cong Chen; Limin Wang; Dahua Lin; Yu Qiao; Ziwei Liu

arXiv:2411.13503·cs.CV·November 21, 2024

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong,, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang,, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

PDF

Open Access 1 Repo 1 Models 3 Datasets

TL;DR

VBench++ is a comprehensive, human-aligned benchmark suite for evaluating various aspects of video generative models, supporting diverse content types and providing detailed insights into model strengths and weaknesses.

Contribution

We introduce VBench++, a novel benchmark suite that dissects video generation quality into multiple dimensions, aligns with human perception, and supports versatile evaluation scenarios including text-to-video and image-to-video.

Findings

01

VBench++ evaluates 16 detailed dimensions of video quality.

02

The benchmark aligns well with human preferences.

03

It reveals current models' strengths and gaps across different content types.

Abstract

Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has several appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models' strengths and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vchitect/vbench
pytorchOfficial

Models

🤗
Video-Bench/Video-Bench
model· 1 dl· ♡ 1
1 dl♡ 1

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis

MethodsALIGN