Automatic background animation generation aligned with LLM-generated lyrics for children’s songs
Sanghyuck Lee, Timur Khairulov, Ye-Chan Park, Wangduk Seo, Jaesung Lee

TL;DR
This paper introduces an AI-based system that automatically creates animated videos for children's songs by generating lyrics and matching background animations.
Contribution
The novelty lies in combining language models for lyric generation with diffusion models for background animation, tailored for children's songs.
Findings
CascadeSD outperformed conventional diffusion models in generating background images.
Landscape and image-style prompting improved the quality of generated animations.
The proposed pipeline produced better results than existing text-to-video models for children's songs.
Abstract
Media content creation is a labor-intensive and expensive process requiring significant time. Recent developments in artificial intelligence have introduced generative models, which have significant potential in the entertainment industry. Meanwhile, demand for video content tailored to children’s songs has steadily increased, reflecting their significant contribution to early education and entertainment. In this paper, we present a generative model-based approach to automated video creation for children’s songs. The proposed pipeline consists of three key steps: generating lyrics using a language model, producing background images with a diffusion model, and overlaying dynamic visual effects to enhance the final output. Our experiments include a comparison of conventional diffusion models and prompt engineering methods, highlighting the superior performance of CascadeSD and the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Artificial Intelligence in Games · Multimodal Machine Learning Applications
