Learning to Generate Semantic Layouts for Higher Text-Image   Correspondence in Text-to-Image Synthesis

Minho Park; Jooyeol Yun; Seunghwan Choi; Jaegul Choo

arXiv:2308.08157·cs.CV·August 17, 2023

Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis

Minho Park, Jooyeol Yun, Seunghwan Choi, Jaegul Choo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel diffusion-based method that generates semantic layouts alongside images to improve text-image correspondence, especially in domain-specific datasets with limited paired data.

Contribution

It proposes a Gaussian-categorical diffusion process for joint image and layout generation, enhancing semantic understanding in text-to-image synthesis without large-scale paired datasets.

Findings

01

Improved text-image correspondence in experiments

02

Effective in domain-specific datasets with scarce pairs

03

Guides models to generate semantically aware images

Abstract

Existing text-to-image generation approaches have set high standards for photorealism and text-image correspondence, largely benefiting from web-scale text-image datasets, which can include up to 5~billion pairs. However, text-to-image generation models trained on domain-specific datasets, such as urban scenes, medical images, and faces, still suffer from low text-image correspondence due to the lack of text-image pairs. Additionally, collecting billions of text-image pairs for a specific domain can be time-consuming and costly. Thus, ensuring high text-image correspondence without relying on web-scale text-image datasets remains a challenging task. In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. Specifically, we propose a Gaussian-categorical diffusion process that simultaneously generates both images and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pmh9960/GCDP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsDiffusion