T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video   Generation

Kaiyue Sun; Kaiyi Huang; Xian Liu; Yue Wu; Zihan Xu; Zhenguo Li; Xihui; Liu

arXiv:2407.14505·cs.CV·January 16, 2025·1 cites

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui, Liu

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces T2V-CompBench, a comprehensive benchmark for evaluating the ability of text-to-video models to generate videos with complex compositional attributes, highlighting current challenges and proposing new evaluation metrics.

Contribution

It presents the first dedicated benchmark for compositional text-to-video generation, including diverse evaluation metrics and a thorough analysis of existing models.

Findings

01

Current models struggle with compositional tasks

02

Proposed metrics correlate well with human judgment

03

Benchmark covers 7 compositional categories with 1400 prompts

Abstract

Text-to-video (T2V) generative models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability for evaluation. In this work, we conduct the first systematic study on compositional text-to-video generation. We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation. T2V-CompBench encompasses diverse aspects of compositionality, including consistent attribute binding, dynamic attribute binding, spatial relationships, motion binding, action binding, object interactions, and generative numeracy. We further carefully design evaluation metrics of multimodal large language model (MLLM)-based, detection-based, and tracking-based metrics, which can better reflect the compositional text-to-video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KaiyueSun98/T2V-CompBench
pytorchOfficial

Models

🤗
Video-Bench/Video-Bench
model· 1 dl· ♡ 1
1 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Natural Language Processing Techniques · Multimodal Machine Learning Applications