Progressive Text-to-Image Diffusion with Soft Latent Direction

YuTeng Ye; Jiale Cai; Hang Zhou; Guanwen Li; Youjia Zhang; Zikai Song,; Chenxing Gao; Junqing Yu; Wei Yang

arXiv:2309.09466·cs.CV·January 22, 2024·1 cites

Progressive Text-to-Image Diffusion with Soft Latent Direction

YuTeng Ye, Jiale Cai, Hang Zhou, Guanwen Li, Youjia Zhang, Zikai Song,, Chenxing Gao, Junqing Yu, Wei Yang

PDF

Open Access 1 Repo

TL;DR

This paper presents a progressive text-to-image synthesis method that uses a Large Language Model to decompose complex descriptions and a novel SRF framework for entity manipulation, improving handling of multiple entities and complex instructions.

Contribution

It introduces a new progressive synthesis and editing framework with SRF, enabling better multi-entity handling and complex text inputs in text-to-image generation.

Findings

01

Enhanced object synthesis with complex and lengthy descriptions

02

Achieved new benchmark performance in text-to-image tasks

03

Effective manipulation of multiple entities respecting constraints

Abstract

In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges. This paper introduces an innovative progressive synthesis and editing operation that systematically incorporates entities into the target image, ensuring their adherence to spatial and relational constraints at each sequential step. Our key insight stems from the observation that while a pre-trained text-to-image diffusion model adeptly handles one or two entities, it often falters when dealing with a greater number. To address this limitation, we propose harnessing the capabilities of a Large Language Model (LLM) to decompose intricate and protracted text descriptions into coherent directives adhering to stringent formats. To facilitate the execution of directives involving distinct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

babahui/progressive-text-to-image
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning

MethodsDiffusion