Evaluating Design Video Generation: Metrics for Compositional Fidelity
Adrienne Deganutti, Dingning Cao, Jaejung Seol, Elad Hirsch, Purvanshi Mehta

TL;DR
This paper introduces an automated evaluation framework for design video generation, focusing on layout, motion, temporal, and content fidelity to standardize benchmarking in the field.
Contribution
It provides the first comprehensive, automated evaluation metrics tailored specifically for structured design animation videos, replacing subjective assessments.
Findings
Framework covers four key dimensions: layout, motion, temporal, content fidelity.
Enables objective benchmarking of generative design video models.
Facilitates consistent comparison across different models and approaches.
Abstract
Generative video models are increasingly used in design animation tasks, yet no standardized evaluation framework exists for this domain. Unlike natural video generation, design animation imposes structured constraints: specific components shall animate with prescribed motion types, directions, speed and timing, while non-animated regions must remain stable and layout structure must be preserved. This paper provides a fully automated evaluation framework organized across four dimensions: layout fidelity, motion correctness, temporal quality, and content fidelity. This eliminates the reliance on subjective human evaluation and establishes a common basis for benchmarking progress in the field.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
