Generating Animated Layouts as Structured Text Representations

Yeonsang Shin; Jihwan Kim; Yumin Song; Kyungseung Lee; Hyunhee Chung,; Taeyoung Na

arXiv:2505.00975·cs.CV·May 5, 2025

Generating Animated Layouts as Structured Text Representations

Yeonsang Shin, Jihwan Kim, Yumin Song, Kyungseung Lee, Hyunhee Chung,, Taeyoung Na

PDF

Open Access

TL;DR

This paper introduces VAKER, a novel text-to-video pipeline that generates animated video advertisements with precise control over layout dynamics using structured text representations.

Contribution

It presents a new hierarchical text-based layout representation and a three-stage generation process for automated animated video ad creation, advancing control and quality.

Findings

01

VAKER outperforms existing methods in video ad generation quality.

02

The approach enables fine-grained control over animated graphic layouts.

03

Extensive evaluations validate the effectiveness of the proposed pipeline.

Abstract

Despite the remarkable progress in text-to-video models, achieving precise control over text elements and animated graphics remains a significant challenge, especially in applications such as video advertisements. To address this limitation, we introduce Animated Layout Generation, a novel approach to extend static graphic layouts with temporal dynamics. We propose a Structured Text Representation for fine-grained video control through hierarchical visual elements. To demonstrate the effectiveness of our approach, we present VAKER (Video Ad maKER), a text-to-video advertisement generation pipeline that combines a three-stage generation process with Unstructured Text Reasoning for seamless integration with LLMs. VAKER fully automates video advertisement generation by incorporating dynamic layout trajectories for objects and graphics across specific video frames. Through extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Natural Language Processing Techniques · Handwritten Text Recognition Techniques