Improving Text-to-Image Consistency via Automatic Prompt Optimization

Oscar Ma\~nas; Pietro Astolfi; Melissa Hall; Candace Ross; Jack; Urbanek; Adina Williams; Aishwarya Agrawal; Adriana Romero-Soriano; Michal; Drozdzal

arXiv:2403.17804·cs.CV·March 27, 2024·6 cites

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Oscar Ma\~nas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack, Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal, Drozdzal

PDF

Open Access 1 Video

TL;DR

This paper introduces OPT2I, a prompt optimization framework using large language models to enhance text-to-image consistency without fine-tuning models, validated on multiple datasets with significant improvements.

Contribution

The paper proposes a novel LLM-based prompt optimization method that improves T2I consistency, addressing limitations of existing prompt refinement approaches.

Findings

01

Boosts consistency score by up to 24.9% on datasets

02

Preserves image quality as measured by FID

03

Increases recall between generated and real data

Abstract

Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions to improve prompt-image consistency suffer from the following challenges: (1) they oftentimes require model fine-tuning, (2) they only focus on nearby prompt samples, and (3) they are affected by unfavorable trade-offs among image quality, representation diversity, and prompt-image consistency. In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improving Text-to-Image Consistency via Automatic Prompt Optimization· slideslive

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Multimedia Communication and Technology

MethodsFocus