FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive   Prompt Weighting

Liyao Jiang; Negar Hassanpour; Mohammad Salameh; Mohan Sai; Singamsetti; Fengyu Sun; Wei Lu; Di Niu

arXiv:2408.11706·cs.CV·April 8, 2025

FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting

Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohan Sai, Singamsetti, Fengyu Sun, Wei Lu, Di Niu

PDF

Open Access

TL;DR

FRAP introduces an adaptive prompt weighting method for text-to-image diffusion models, significantly enhancing prompt-image alignment and image realism while maintaining low latency, outperforming existing latent optimization techniques.

Contribution

The paper presents a novel online algorithm for adaptively adjusting token weights in prompts, improving alignment and authenticity in generated images without extensive latent code optimization.

Findings

01

FRAP achieves higher prompt-image alignment on complex datasets.

02

FRAP reduces generation latency by 4 seconds compared to D&B.

03

FRAP produces more realistic images as measured by CLIP-IQA-Real metric.

Abstract

Text-to-image (T2I) diffusion models have demonstrated impressive capabilities in generating high-quality images given a text prompt. However, ensuring the prompt-image alignment remains a considerable challenge, i.e., generating images that faithfully align with the prompt's semantics. Recent works attempt to improve the faithfulness by optimizing the latent code, which potentially could cause the latent code to go out-of-distribution and thus produce unrealistic images. In this paper, we propose FRAP, a simple, yet effective approach based on adaptively adjusting the per-token prompt weights to improve prompt-image alignment and authenticity of the generated images. We design an online algorithm to adaptively update each token's weight coefficient, which is achieved by minimizing a unified objective function that encourages object presence and the binding of object-modifier pairs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques

MethodsDiffusion · ALIGN