Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
Chutian Meng, Fan Ma, Jiaxu Miao, Chi Zhang, Yi Yang, Yueting Zhuang

TL;DR
This paper introduces the Image Regeneration task to evaluate text-to-image models by generating images that match reference images, using GPT4V for understanding, and proposes the ImageRepainter framework to improve image quality and content comprehension.
Contribution
The study proposes a novel Image Regeneration evaluation method using multimodal large language models and introduces datasets and a framework to enhance image generation fidelity.
Findings
The ImageRegeneration task provides a more reliable assessment of T2I models.
The ImageRepainter framework improves generated image quality and content accuracy.
Experiments show that MLLM-guided models produce images more similar to reference images.
Abstract
Diffusion models have revitalized the image generation domain, playing crucial roles in both academic research and artistic expression. With the emergence of new diffusion models, assessing the performance of text-to-image models has become increasingly important. Current metrics focus on directly matching the input text with the generated image, but due to cross-modal information asymmetry, this leads to unreliable or incomplete assessment results. Motivated by this, we introduce the Image Regeneration task in this study to assess text-to-image models by tasking the T2I model with generating an image according to the reference image. We use GPT4V to bridge the gap between the reference image and the text input for the T2I model, allowing T2I models to understand image content. This evaluation process is simplified as comparisons between the generated image and the reference image are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsFocus · Diffusion
