TaleCrafter: Interactive Story Visualization with Multiple Characters
Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin, Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang

TL;DR
TaleCrafter is an interactive story visualization system that generates consistent, editable images of stories with multiple characters by leveraging large language and image models, enabling flexible scene and layout editing.
Contribution
The paper introduces a novel system combining story-to-prompt, layout generation, controllable image synthesis, and animation, allowing flexible editing and handling of new characters and scenes.
Findings
Effective story visualization with multiple characters.
Supports interactive editing of layouts and local structures.
Validated through experiments and user studies.
Abstract
Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting a text-to-image (T2I) model on a set of videos in the same style and with the same characters, e.g., the FlintstonesSV dataset. However, the learned T2I models typically struggle to adapt to new characters, scenes, and styles, and often lack the flexibility to revise the layout of the synthesized images. This paper proposes a system for generic interactive story visualization, capable of handling multiple novel characters and supporting the editing of layout and local structure. It is developed by leveraging the prior knowledge of large language and T2I models, trained on massive corpora. The system comprises four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Multimodal Machine Learning Applications
