Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo, Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen

TL;DR
This paper introduces Animate-A-Story, a framework that synthesizes coherent storytelling videos by retrieving relevant video clips and customizing their appearances using text prompts, simplifying the video creation process.
Contribution
It presents a novel two-module system combining retrieval and guided synthesis for controllable, plot-aligned video generation from existing clips.
Findings
Outperforms existing baselines in coherence and customization.
Enables flexible control over characters and structure.
Produces visually consistent storytelling videos.
Abstract
Generating videos for visual storytelling can be a tedious and complex process that typically requires either live-action filming or graphics animation rendering. To bypass these challenges, our key idea is to utilize the abundance of existing video clips and synthesize a coherent storytelling video by customizing their appearances. We achieve this by developing a framework comprised of two functional modules: (i) Motion Structure Retrieval, which provides video candidates with desired scene or motion context described by query texts, and (ii) Structure-Guided Text-to-Video Synthesis, which generates plot-aligned videos under the guidance of motion structure and text prompts. For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure. For the second module, we propose a controllable video generation model that offers flexible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Motion and Animation · Artificial Intelligence in Games
