Rethinking The Training And Evaluation of Rich-Context Layout-to-Image   Generation

Jiaxin Cheng; Zixu Zhao; Tong He; Tianjun Xiao; Yicong Zhou; Zheng; Zhang

arXiv:2409.04847·cs.CV·January 14, 2025

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

Jiaxin Cheng, Zixu Zhao, Tong He, Tianjun Xiao, Yicong Zhou, Zheng, Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a regional cross-attention module for layout-to-image generation, proposes new evaluation metrics for open-vocabulary scenarios, and validates them through user studies, advancing the robustness and assessment of generative models.

Contribution

The study presents a novel regional cross-attention module and new metrics for open-vocabulary layout-to-image generation, addressing existing limitations in representation and evaluation.

Findings

01

Improved layout region representation with the new module.

02

Effective evaluation metrics aligned with human preferences.

03

Enhanced generation quality in complex, detailed scenarios.

Abstract

Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative modeling is layout-to-image (L2I) generation, where predefined layouts of objects guide the generative process. In this study, we introduce a novel regional cross-attention module tailored to enrich layout-to-image generation. This module notably improves the representation of layout regions, particularly in scenarios where existing methods struggle with highly complex and detailed textual descriptions. Moreover, while current open-vocabulary L2I methods are trained in an open-set setting, their evaluations often occur in closed-set environments. To bridge this gap, we propose two metrics to assess L2I performance in open-vocabulary scenarios. Additionally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cplusx/rich_context_l2i
pytorchOfficial

Videos

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation· slideslive

Taxonomy

TopicsAugmented Reality Applications · Computer Graphics and Visualization Techniques · Digital Imaging in Medicine

MethodsSoftmax · Concatenated Skip Connection