Prompt Refinement with Image Pivot for Text-to-Image Generation

Jingtao Zhan; Qingyao Ai; Yiqun Liu; Yingwei Pan; Ting Yao; Jiaxin; Mao; Shaoping Ma; Tao Mei

arXiv:2407.00247·cs.CV·July 2, 2024

Prompt Refinement with Image Pivot for Text-to-Image Generation

Jingtao Zhan, Qingyao Ai, Yiqun Liu, Yingwei Pan, Ting Yao, Jiaxin, Mao, Shaoping Ma, Tao Mei

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces PRIP, a novel zero-shot prompt refinement method for text-to-image generation that uses image representations as pivots, improving prompt translation without needing parallel corpora.

Contribution

PRIP leverages image representations as intermediaries, enabling effective prompt refinement in a zero-shot setting without parallel training data.

Findings

01

PRIP outperforms various baselines in prompt refinement tasks.

02

PRIP effectively generalizes to unseen systems in zero-shot scenarios.

03

Extensive experiments validate the superiority of PRIP over existing methods.

Abstract

For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary "pivot" between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jingtaozhan/promptreformulate
pytorchOfficial

Videos

Prompt Refinement with Image Pivot for Text-to-Image Generation· underline

Taxonomy

TopicsImage Retrieval and Classification Techniques · Image Processing Techniques and Applications