Prompt Refinement with Image Pivot for Text-to-Image Generation
Jingtao Zhan, Qingyao Ai, Yiqun Liu, Yingwei Pan, Ting Yao, Jiaxin, Mao, Shaoping Ma, Tao Mei

TL;DR
This paper introduces PRIP, a novel zero-shot prompt refinement method for text-to-image generation that uses image representations as pivots, improving prompt translation without needing parallel corpora.
Contribution
PRIP leverages image representations as intermediaries, enabling effective prompt refinement in a zero-shot setting without parallel training data.
Findings
PRIP outperforms various baselines in prompt refinement tasks.
PRIP effectively generalizes to unseen systems in zero-shot scenarios.
Extensive experiments validate the superiority of PRIP over existing methods.
Abstract
For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary "pivot" between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Image Processing Techniques and Applications
