Control and Realism: Best of Both Worlds in Layout-to-Image without Training

Bonan Li; Yinhan Hu; Songhua Liu; Xinchao Wang

arXiv:2506.15563·cs.CV·June 19, 2025

Control and Realism: Best of Both Worlds in Layout-to-Image without Training

Bonan Li, Yinhan Hu, Songhua Liu, Xinchao Wang

PDF

Open Access

TL;DR

This paper introduces WinWinLay, a training-free method for layout-to-image generation that improves control precision and realism by addressing attention biases and out-of-distribution artifacts, outperforming existing methods.

Contribution

WinWinLay proposes a novel training-free approach with non-local attention and adaptive updates to enhance layout control and image realism in diffusion models.

Findings

01

Outperforms state-of-the-art in layout control accuracy

02

Achieves higher photorealism in generated images

03

Effectively reduces artifacts and localization errors

Abstract

Layout-to-Image generation aims to create complex scenes with precise control over the placement and arrangement of subjects. Existing works have demonstrated that pre-trained Text-to-Image diffusion models can achieve this goal without training on any specific data; however, they often face challenges with imprecise localization and unrealistic artifacts. Focusing on these drawbacks, we propose a novel training-free method, WinWinLay. At its core, WinWinLay presents two key strategies, Non-local Attention Energy Function and Adaptive Update, that collaboratively enhance control precision and realism. On one hand, we theoretically demonstrate that the commonly used attention energy function introduces inherent spatial distribution biases, hindering objects from being uniformly aligned with layout instructions. To overcome this issue, non-local attention prior is explored to redistribute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging