Improving Text-to-Image Consistency via Automatic Prompt Optimization
Oscar Ma\~nas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack, Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal, Drozdzal

TL;DR
This paper introduces OPT2I, a prompt optimization framework using large language models to enhance text-to-image consistency without fine-tuning models, validated on multiple datasets with significant improvements.
Contribution
The paper proposes a novel LLM-based prompt optimization method that improves T2I consistency, addressing limitations of existing prompt refinement approaches.
Findings
Boosts consistency score by up to 24.9% on datasets
Preserves image quality as measured by FID
Increases recall between generated and real data
Abstract
Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions to improve prompt-image consistency suffer from the following challenges: (1) they oftentimes require model fine-tuning, (2) they only focus on nearby prompt samples, and (3) they are affected by unfavorable trade-offs among image quality, representation diversity, and prompt-image consistency. In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Multimedia Communication and Technology
MethodsFocus
