Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
Seunghoon Hong, Dingdong Yang, Jongwook Choi, Honglak Lee

TL;DR
This paper introduces a hierarchical text-to-image synthesis method that infers semantic layouts from text, improving image quality, interpretability, and semantic alignment by decomposing the generation process into layout and image synthesis stages.
Contribution
The paper presents a novel hierarchical approach that infers semantic layouts from text and generates images conditioned on these layouts, enhancing control and interpretability.
Findings
Improves image quality over existing methods
Enhances semantic alignment with input text
Allows automatic annotation and user-controlled scene modification
Abstract
We propose a novel hierarchical approach for text-to-image synthesis by inferring semantic layout. Instead of learning a direct mapping from text to image, our algorithm decomposes the generation process into multiple steps, in which it first constructs a semantic layout from the text by the layout generator and converts the layout to an image by the image generator. The proposed layout generator progressively constructs a semantic layout in a coarse-to-fine manner by generating object bounding boxes and refining each box by estimating object shapes inside the box. The image generator synthesizes an image conditioned on the inferred semantic layout, which provides a useful semantic structure of an image matching with the text description. Our model not only generates semantically more meaningful images, but also allows automatic annotation of generated images and user-controlled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques
MethodsInterpretability
