Kinetic Typography Diffusion Model
Seonmi Park, Inhwan Bae, Seunghyun Shin, Hae-Gon Jeon

TL;DR
This paper presents a novel diffusion-based method for generating realistic, artistic kinetic typography videos that are both visually appealing and legible, using a large dataset and specialized guidance techniques.
Contribution
It introduces a new dataset of 600K videos and a guided video diffusion model with static and dynamic captions for high-quality kinetic typography generation.
Findings
Generated videos have legible and artistic letter motions.
The model effectively incorporates static and dynamic guidance.
Glyph loss improves letter readability.
Abstract
This paper introduces a method for realistic kinetic typography that generates user-preferred animatable 'text content'. We draw on recent advances in guided video diffusion models to achieve visually-pleasing text appearances. To do this, we first construct a kinetic typography dataset, comprising about 600K videos. Our dataset is made from a variety of combinations in 584 templates designed by professional motion graphics designers and involves changing each letter's position, glyph, and size (i.e., flying, glitches, chromatic aberration, reflecting effects, etc.). Next, we propose a video diffusion model for kinetic typography. For this, there are three requirements: aesthetic appearances, motion effects, and readable letters. This paper identifies the requirements. For this, we present static and dynamic captions used as spatial and temporal guidance of a video diffusion model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media
MethodsDiffusion · Convolution
