Draw Like an Artist: Complex Scene Generation with Diffusion Model via   Composition, Painting, and Retouching

Minghao Liu; Le Zhang; Yingjie Tian; Xiaochao Qu; Luoqi Liu; Ting Liu

arXiv:2408.13858·cs.CV·August 27, 2024

Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching

Minghao Liu, Le Zhang, Yingjie Tian, Xiaochao Qu, Luoqi Liu, Ting Liu

PDF

Open Access

TL;DR

This paper introduces a training-free diffusion framework called Complex Diffusion (CxD) that mimics the artist's process of creating complex scenes through composition, painting, and retouching, leveraging language models for prompt decomposition and scene management.

Contribution

The paper presents a novel three-stage diffusion-based method for complex scene generation, guided by a new definition and decomposition criteria, without requiring additional training.

Findings

01

Outperforms previous SOTA methods in complex scene generation

02

Produces high-quality, semantically consistent, and diverse images

03

Effectively manages intricate prompts for detailed scene creation

Abstract

Recent advances in text-to-image diffusion models have demonstrated impressive capabilities in image quality. However, complex scene generation remains relatively unexplored, and even the definition of `complex scene' itself remains unclear. In this paper, we address this gap by providing a precise definition of complex scenes and introducing a set of Complex Decomposition Criteria (CDC) based on this definition. Inspired by the artists painting process, we propose a training-free diffusion framework called Complex Diffusion (CxD), which divides the process into three stages: composition, painting, and retouching. Our method leverages the powerful chain-of-thought capabilities of large language models (LLMs) to decompose complex prompts based on CDC and to manage composition and layout. We then develop an attention modulation method that guides simple prompts to specific regions to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Diffusion