Obtaining Favorable Layouts for Multiple Object Generation
Barak Battash, Amit Rozner, Lior Wolf, Ofir Lindenbaum

TL;DR
This paper introduces a novel diffusion-based method for multi-object image generation that improves layout control and subject separation, addressing limitations of existing models in multi-subject scene synthesis.
Contribution
It proposes a layout-guided diffusion approach with new loss functions to better capture multiple subjects and spatial arrangements in generated images.
Findings
Enhanced fidelity in multi-object image generation
Better adherence to specified layouts and masks
Reduced overlap and clearer spatial separation
Abstract
Large-scale text-to-image models that can generate high-quality and diverse images based on textual prompts have shown remarkable success. These models aim ultimately to create complex scenes, and addressing the challenge of multi-subject generation is a critical step towards this goal. However, the existing state-of-the-art diffusion models face difficulty when generating images that involve multiple subjects. When presented with a prompt containing more than one subject, these models may omit some subjects or merge them together. To address this challenge, we propose a novel approach based on a guiding principle. We allow the diffusion model to initially propose a layout, and then we rearrange the layout grid. This is achieved by enforcing cross-attention maps (XAMs) to adhere to proposed masks and by migrating pixels from latent maps to new locations determined by us. We introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Robotic Path Planning Algorithms · Interactive and Immersive Displays
MethodsDiffusion · ALIGN
