Visual question answering based evaluation metrics for text-to-image generation
Mizuki Miyamoto, Ryugo Morita, Jinjia Zhou

TL;DR
This paper introduces new evaluation metrics for text-to-image generation that utilize question generation and visual question answering to assess detailed text-image alignment and image quality.
Contribution
The paper proposes a novel evaluation framework combining question-based assessment and image quality metrics for more precise evaluation of text-to-image models.
Findings
The proposed metrics outperform existing methods in assessing text-image alignment.
The approach allows for adjustable weighting between alignment and image quality.
Experimental results validate the effectiveness of the new evaluation approach.
Abstract
Text-to-image generation and text-guided image manipulation have received considerable attention in the field of image generation tasks. However, the mainstream evaluation methods for these tasks have difficulty in evaluating whether all the information from the input text is accurately reflected in the generated images, and they mainly focus on evaluating the overall alignment between the input text and the generated images. This paper proposes new evaluation metrics that assess the alignment between input text and generated images for every individual object. Firstly, according to the input text, chatGPT is utilized to produce questions for the generated images. After that, we use Visual Question Answering(VQA) to measure the relevance of the generated images to the input text, which allows for a more detailed evaluation of the alignment compared to existing methods. In addition, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Motion and Animation
MethodsSoftmax · Attention Is All You Need · Focus
