Evaluating Design Video Generation: Metrics for Compositional Fidelity

Adrienne Deganutti; Dingning Cao; Jaejung Seol; Elad Hirsch; Purvanshi Mehta

arXiv:2605.16223·cs.GR·May 18, 2026

Evaluating Design Video Generation: Metrics for Compositional Fidelity

Adrienne Deganutti, Dingning Cao, Jaejung Seol, Elad Hirsch, Purvanshi Mehta

PDF

TL;DR

This paper introduces an automated evaluation framework for design video generation, focusing on layout, motion, temporal, and content fidelity to standardize benchmarking in the field.

Contribution

It provides the first comprehensive, automated evaluation metrics tailored specifically for structured design animation videos, replacing subjective assessments.

Findings

01

Framework covers four key dimensions: layout, motion, temporal, content fidelity.

02

Enables objective benchmarking of generative design video models.

03

Facilitates consistent comparison across different models and approaches.

Abstract

Generative video models are increasingly used in design animation tasks, yet no standardized evaluation framework exists for this domain. Unlike natural video generation, design animation imposes structured constraints: specific components shall animate with prescribed motion types, directions, speed and timing, while non-animated regions must remain stable and layout structure must be preserved. This paper provides a fully automated evaluation framework organized across four dimensions: layout fidelity, motion correctness, temporal quality, and content fidelity. This eliminates the reliance on subjective human evaluation and establishes a common basis for benchmarking progress in the field.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.