GLoD: Composing Global Contexts and Local Details in Image Generation

Moyuru Yamada

arXiv:2404.15447·cs.CV·April 25, 2024·1 cites

GLoD: Composing Global Contexts and Local Details in Image Generation

Moyuru Yamada

PDF

Open Access

TL;DR

GLoD introduces a framework that enables simultaneous control over global contexts and local details in text-to-image diffusion models, improving the synthesis of complex, attribute-specific images without additional training.

Contribution

It presents a novel global-local composition method that guides pre-trained diffusion models to generate images with detailed control over both global and local aspects.

Findings

01

Effective generation of complex images with specified object interactions.

02

Preserves object identities and attributes in generated images.

03

No additional training or fine-tuning required.

Abstract

Diffusion models have demonstrated their capability to synthesize high-quality and diverse images from textual prompts. However, simultaneous control over both global contexts (e.g., object layouts and interactions) and local details (e.g., colors and emotions) still remains a significant challenge. The models often fail to understand complex descriptions involving multiple objects and reflect specified visual attributes to wrong targets or ignore them. This paper presents Global-Local Diffusion (\textit{GLoD}), a novel framework which allows simultaneous control over the global contexts and the local details in text-to-image generation without requiring training or fine-tuning. It assigns multiple global and local prompts to corresponding layers and composes their noises to guide a denoising process using pre-trained diffusion models. Our framework enables complex global-local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion