ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang,, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan

TL;DR
ChronoMagic-Bench introduces a comprehensive benchmark for evaluating text-to-video models on their ability to generate diverse, coherent, and metamorphic time-lapse videos across multiple scientific domains, with new metrics and datasets.
Contribution
This work presents a novel benchmark, new evaluation metrics, and a large-scale dataset specifically designed to assess the metamorphic and temporal coherence capabilities of T2V models.
Findings
Models show varied strengths across categories
New metrics effectively quantify metamorphic amplitude and coherence
Benchmark reveals current limitations in T2V models' temporal understanding
Abstract
We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic capabilities of the T2V models (e.g. Sora and Lumiere) in time-lapse video generation. In contrast to existing benchmarks that focus on visual quality and textual relevance of generated videos, ChronoMagic-Bench focuses on the model's ability to generate time-lapse videos with significant metamorphic amplitude and temporal coherence. The benchmark probes T2V models for their physics, biology, and chemistry capabilities, in a free-form text query. For these purposes, ChronoMagic-Bench introduces 1,649 prompts and real-world videos as references, categorized into four major types of time-lapse videos: biological, human-created, meteorological, and physical phenomena, which are further divided into 75 subcategories. This categorization comprehensively evaluates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimedia Communication and Technology · Video Analysis and Summarization · Video Coding and Compression Technologies
MethodsALIGN · Focus
