Seek for Incantations: Towards Accurate Text-to-Image Diffusion   Synthesis through Prompt Engineering

Chang Yu; Junran Peng; Xiangyu Zhu; Zhaoxiang Zhang; Qi Tian; Zhen Lei

arXiv:2401.06345·cs.CV·January 15, 2024·1 cites

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering

Chang Yu, Junran Peng, Xiangyu Zhu, Zhaoxiang Zhang, Qi Tian, Zhen Lei

PDF

Open Access

TL;DR

This paper introduces a prompt learning framework that automatically optimizes textual descriptions to enhance the accuracy of text-to-image diffusion synthesis, especially for complex texts, reducing manual effort.

Contribution

It proposes a novel prompt learning approach utilizing quality and semantic guidance from pre-trained diffusion models to improve image-text alignment.

Findings

01

Enhanced image quality and accuracy with learned prompts

02

Reduced manual prompt engineering effort

03

Validated effectiveness through extensive experiments

Abstract

The text-to-image synthesis by diffusion models has recently shown remarkable performance in generating high-quality images. Although performs well for simple texts, the models may get confused when faced with complex texts that contain multiple objects or spatial relationships. To get the desired images, a feasible way is to manually adjust the textual descriptions, i.e., narrating the texts or adding some words, which is labor-consuming. In this paper, we propose a framework to learn the proper textual descriptions for diffusion models through prompt learning. By utilizing the quality guidance and the semantic guidance derived from the pre-trained diffusion model, our method can effectively learn the prompts to improve the matches between the input text and the generated images. Extensive experiments and analyses have validated the effectiveness of the proposed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Video Analysis and Summarization

MethodsDiffusion