MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual   Storytelling via Multi-Layered Semantic-Aware Denoising

Bingyuan Wang; Hengyu Meng; Zeyu Cai; Lanjiong Li; Yue Ma; Qifeng; Chen; Zeyu Wang

arXiv:2312.10899·cs.CV·December 19, 2023·1 cites

MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising

Bingyuan Wang, Hengyu Meng, Zeyu Cai, Lanjiong Li, Yue Ma, Qifeng, Chen, Zeyu Wang

PDF

Open Access

TL;DR

MagicScroll is a diffusion-based framework that enables controllable, coherent, and expressive nontypical aspect-ratio image generation for visual storytelling, addressing previous limitations in style, layout, and content diversity.

Contribution

It introduces a multi-layered, semantic-aware denoising process and establishes the first benchmark for nontypical aspect-ratio image generation in visual storytelling.

Findings

01

Improves alignment with narrative text

02

Enhances visual coherence and engagement

03

Provides fine-grained control over image content

Abstract

Visual storytelling often uses nontypical aspect-ratio images like scroll paintings, comic strips, and panoramas to create an expressive and compelling narrative. While generative AI has achieved great success and shown the potential to reshape the creative industry, it remains a challenge to generate coherent and engaging content with arbitrary size and controllable style, concept, and layout, all of which are essential for visual storytelling. To overcome the shortcomings of previous methods including repetitive content, style inconsistency, and lack of controllability, we propose MagicScroll, a multi-layered, progressive diffusion-based image generation framework with a novel semantic-aware denoising process. The model enables fine-grained control over the generated image on object, scene, and background levels with text, image, and layout conditions. We also establish the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques