TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William, Yang Wang

TL;DR
This paper introduces TC-Bench, a comprehensive benchmark for evaluating the temporal compositionality of text-to-video and image-to-video generation models, highlighting current models' limitations in capturing complex scene transitions over time.
Contribution
The study presents TC-Bench, a novel benchmark with new metrics and aligned real-world videos to evaluate and improve temporal compositionality in video generation models.
Findings
Most models achieve less than 20% of expected compositional changes.
New metrics correlate better with human judgments than existing ones.
Current models struggle with interpreting and synthesizing complex temporal scene changes.
Abstract
Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluating simple actions and argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world videos as time progresses. To assess the Temporal Compositionality of video generation models, we propose TC-Bench, a benchmark of meticulously crafted text prompts, corresponding ground truth videos, and robust evaluation metrics. The prompts articulate the initial and final states of scenes, effectively reducing ambiguities for frame development and simplifying the assessment of transition completion. In addition, by collecting aligned real-world videos corresponding to the prompts, we expand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimedia Communication and Technology
