GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
Ashish Goswami, Satyam Kumar Modi, Santhosh Rishi Deshineni, Harman, Singh, Prathosh A. P, Parag Singla

TL;DR
GraPE introduces a modular generate-plan-edit framework for text-to-image synthesis that improves fidelity to complex prompts by identifying and correcting mistakes through multi-modal language models and image editing, enhancing existing diffusion models.
Contribution
It proposes a novel three-step paradigm for T2I synthesis that decomposes complex generation into generate, plan, and edit stages, leveraging multi-modal LLMs and image editing models, and develops a compositional editing model.
Findings
Improves SOTA performance by up to 3 points on benchmarks.
Reduces performance gap between weaker and stronger models.
Flexible trade-off between inference time and accuracy.
Abstract
Text-to-image (T2I) generation has seen significant progress with diffusion models, enabling generation of photo-realistic images from text prompts. Despite this progress, existing methods still face challenges in following complex text prompts, especially those requiring compositional and multi-step reasoning. Given such complex instructions, SOTA models often make mistakes in faithfully modeling object attributes, and relationships among them. In this work, we present an alternate paradigm for T2I synthesis, decomposing the task of complex multi-step generation into three steps, (a) Generate: we first generate an image using existing diffusion models (b) Plan: we make use of Multi-Modal LLMs (MLLMs) to identify the mistakes in the generated image expressed in terms of individual objects and their properties, and produce a sequence of corrective steps required in the form of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Manufacturing Process and Optimization
MethodsDiffusion
