TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and   Image-to-Video Generation

Weixi Feng; Jiachen Li; Michael Saxon; Tsu-jui Fu; Wenhu Chen; William; Yang Wang

arXiv:2406.08656·cs.CV·June 14, 2024·1 cites

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William, Yang Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TC-Bench, a comprehensive benchmark for evaluating the temporal compositionality of text-to-video and image-to-video generation models, highlighting current models' limitations in capturing complex scene transitions over time.

Contribution

The study presents TC-Bench, a novel benchmark with new metrics and aligned real-world videos to evaluate and improve temporal compositionality in video generation models.

Findings

01

Most models achieve less than 20% of expected compositional changes.

02

New metrics correlate better with human judgments than existing ones.

03

Current models struggle with interpreting and synthesizing complex temporal scene changes.

Abstract

Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluating simple actions and argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world videos as time progresses. To assess the Temporal Compositionality of video generation models, we propose TC-Bench, a benchmark of meticulously crafted text prompts, corresponding ground truth videos, and robust evaluation metrics. The prompts articulate the initial and final states of scenes, effectively reducing ambiguities for frame development and simplifying the assessment of transition completion. In addition, by collecting aligned real-world videos corresponding to the prompts, we expand…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weixi-feng/tc-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimedia Communication and Technology