Obtaining Favorable Layouts for Multiple Object Generation

Barak Battash; Amit Rozner; Lior Wolf; Ofir Lindenbaum

arXiv:2405.00791·cs.CV·May 3, 2024

Obtaining Favorable Layouts for Multiple Object Generation

Barak Battash, Amit Rozner, Lior Wolf, Ofir Lindenbaum

PDF

Open Access

TL;DR

This paper introduces a novel diffusion-based method for multi-object image generation that improves layout control and subject separation, addressing limitations of existing models in multi-subject scene synthesis.

Contribution

It proposes a layout-guided diffusion approach with new loss functions to better capture multiple subjects and spatial arrangements in generated images.

Findings

01

Enhanced fidelity in multi-object image generation

02

Better adherence to specified layouts and masks

03

Reduced overlap and clearer spatial separation

Abstract

Large-scale text-to-image models that can generate high-quality and diverse images based on textual prompts have shown remarkable success. These models aim ultimately to create complex scenes, and addressing the challenge of multi-subject generation is a critical step towards this goal. However, the existing state-of-the-art diffusion models face difficulty when generating images that involve multiple subjects. When presented with a prompt containing more than one subject, these models may omit some subjects or merge them together. To address this challenge, we propose a novel approach based on a guiding principle. We allow the diffusion model to initially propose a layout, and then we rearrange the layout grid. This is achieved by enforcing cross-attention maps (XAMs) to adhere to proposed masks and by migrating pixels from latent maps to new locations determined by us. We introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Robotic Path Planning Algorithms · Interactive and Immersive Displays

MethodsDiffusion · ALIGN