AutoStory: Generating Diverse Storytelling Images with Minimal Human   Effort

Wen Wang; Canyu Zhao; Hao Chen; Zhekai Chen; Kecheng Zheng; Chunhua; Shen

arXiv:2311.11243·cs.CV·November 21, 2023·2 cites

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua, Shen

PDF

Open Access 1 Repo

TL;DR

AutoStory introduces an automated system for generating diverse, high-quality, and consistent storytelling images with minimal human effort by combining large language models and text-to-image models, enhancing story visualization applications.

Contribution

The paper presents a novel automated story visualization pipeline that integrates layout planning, dense control condition generation, and multi-view character consistency without extensive human input.

Findings

01

Effective layout planning using large language models.

02

Dense control conditions improve image quality.

03

Multi-view character consistency achieved without manual labor.

Abstract

Story visualization aims to generate a series of images that match the story described in texts, and it requires the generated images to satisfy high quality, alignment with the text description, and consistency in character identities. Given the complexity of story visualization, existing methods drastically simplify the problem by considering only a few specific characters and scenarios, or requiring the users to provide per-image control conditions such as sketches. However, these simplifications render these methods incompetent for real applications. To this end, we propose an automated story visualization system that can effectively generate diverse, high-quality, and consistent sets of story images, with minimal human interactions. Specifically, we utilize the comprehension and planning capabilities of large language models for layout planning, and then leverage large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aim-uofa/AutoStory
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques