Layout-Bridging Text-to-Image Synthesis

Jiadong Liang; Wenjie Pei; Feng Lu

arXiv:2208.06162·cs.CV·August 15, 2022·1 cites

Layout-Bridging Text-to-Image Synthesis

Jiadong Liang, Wenjie Pei, Feng Lu

PDF

Open Access

TL;DR

This paper introduces a novel approach for text-to-image synthesis that emphasizes effective layout modeling through Transformer-based text-to-layout generation and layout-to-image synthesis, improving semantic consistency and spatial accuracy.

Contribution

It proposes a new Transformer-based framework for joint text-to-layout and layout-to-image synthesis, along with a novel Layout Quality Score metric for evaluating layout quality.

Findings

01

Outperforms state-of-the-art methods in layout prediction

02

Achieves higher image synthesis quality from text descriptions

03

Demonstrates effective modeling of spatial relationships

Abstract

The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typical methods, which seek to model the text-to-image mapping directly, could only capture keywords in the text that indicates common objects or actions but fail to learn their spatial distribution patterns. An effective way to circumvent this limitation is to generate an image layout as guidance, which is attempted by a few methods. Nevertheless, these methods fail to generate practically effective layouts due to the diversity of input text and object location. In this paper we push for effective modeling in both text-to-layout generation and layout-to-image synthesis. Specifically, we formulate the text-to-layout generation as a sequence-to-sequence modeling task, and build our model upon Transformer to learn the spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Human Motion and Animation · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Absolute Position Encodings · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Layer Normalization · Dropout