Image2Text2Image: A Novel Framework for Label-Free Evaluation of   Image-to-Text Generation with Text-to-Image Diffusion Models

Jia-Hong Huang; Hongyi Zhu; Yixian Shen; Stevan Rudinac and; Evangelos Kanoulas

arXiv:2411.05706·cs.CV·November 11, 2024

Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models

Jia-Hong Huang, Hongyi Zhu, Yixian Shen, Stevan Rudinac and, Evangelos Kanoulas

PDF

Open Access

TL;DR

The paper introduces Image2Text2Image, a new label-free evaluation framework for image captioning that uses diffusion models to compare original and generated images, correlating well with human judgment.

Contribution

It presents a novel, reference-free evaluation method leveraging diffusion models to assess image captioning quality without human-annotated references.

Findings

01

High correlation with human evaluation

02

Effective in identifying captioning weaknesses

03

Does not require reference captions

Abstract

Evaluating the quality of automatically generated image descriptions is a complex task that requires metrics capturing various dimensions, such as grammaticality, coverage, accuracy, and truthfulness. Although human evaluation provides valuable insights, its cost and time-consuming nature pose limitations. Existing automated metrics like BLEU, ROUGE, METEOR, and CIDEr attempt to fill this gap, but they often exhibit weak correlations with human judgment. To address this challenge, we propose a novel evaluation framework called Image2Text2Image, which leverages diffusion models, such as Stable Diffusion or DALL-E, for text-to-image generation. In the Image2Text2Image framework, an input image is first processed by a selected image captioning model, chosen for evaluation, to generate a textual description. Using this generated description, a diffusion model then creates a new image. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Image Retrieval and Classification Techniques · AI in cancer detection

MethodsDiffusion