Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic   Image Design and Generation

Zhengyuan Yang; Jianfeng Wang; Linjie Li; Kevin Lin; Chung-Ching Lin,; Zicheng Liu; Lijuan Wang

arXiv:2310.08541·cs.CV·August 15, 2024·2 cites

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin,, Zicheng Liu, Lijuan Wang

PDF

Open Access

TL;DR

Idea2Img leverages GPT-4V(ision) for iterative self-refinement to enhance automatic image design, enabling better prompt generation and image quality through multimodal feedback and exploration of T2I models.

Contribution

Introduces Idea2Img, a novel system using GPT-4V(ision) for multimodal iterative self-refinement in image generation, improving prompt effectiveness and image quality.

Findings

01

Enhanced image quality and semantic relevance.

02

Effective exploration of unknown T2I models.

03

Validated by user preference study.

Abstract

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. Idea2Img cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model's characteristics. The iterative self-refinement brings Idea2Img various advantages over vanilla T2I…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques