Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Jiaxin Cheng, Zixu Zhao, Tong He, Tianjun Xiao, Yicong Zhou, Zheng, Zhang

TL;DR
This paper introduces a regional cross-attention module for layout-to-image generation, proposes new evaluation metrics for open-vocabulary scenarios, and validates them through user studies, advancing the robustness and assessment of generative models.
Contribution
The study presents a novel regional cross-attention module and new metrics for open-vocabulary layout-to-image generation, addressing existing limitations in representation and evaluation.
Findings
Improved layout region representation with the new module.
Effective evaluation metrics aligned with human preferences.
Enhanced generation quality in complex, detailed scenarios.
Abstract
Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative modeling is layout-to-image (L2I) generation, where predefined layouts of objects guide the generative process. In this study, we introduce a novel regional cross-attention module tailored to enrich layout-to-image generation. This module notably improves the representation of layout regions, particularly in scenarios where existing methods struggle with highly complex and detailed textual descriptions. Moreover, while current open-vocabulary L2I methods are trained in an open-set setting, their evaluations often occur in closed-set environments. To bridge this gap, we propose two metrics to assess L2I performance in open-vocabulary scenarios. Additionally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAugmented Reality Applications · Computer Graphics and Visualization Techniques · Digital Imaging in Medicine
MethodsSoftmax · Concatenated Skip Connection
