Holistic Evaluation for Interleaved Text-and-Image Generation

Minqian Liu; Zhiyang Xu; Zihao Lin; Trevor Ashby; Joy Rimchala; Jiaxin; Zhang; Lifu Huang

arXiv:2406.14643·cs.CV·October 10, 2024

Holistic Evaluation for Interleaved Text-and-Image Generation

Minqian Liu, Zhiyang Xu, Zihao Lin, Trevor Ashby, Joy Rimchala, Jiaxin, Zhang, Lifu Huang

PDF

Open Access 1 Models 1 Video

TL;DR

This paper introduces InterleavedBench, a comprehensive benchmark, and InterleavedEval, a GPT-4o powered metric, to evaluate interleaved text-and-image generation across diverse tasks with high correlation to human judgment.

Contribution

The paper presents the first dedicated benchmark and a reference-free evaluation metric for interleaved text-and-image generation, addressing current limitations.

Findings

01

InterleavedBench covers diverse real-world use cases.

02

InterleavedEval correlates strongly with human judgments.

03

The proposed methods outperform existing metrics.

Abstract

Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs and outputs, and they only cover a limited number of domains and use cases. Also, current works predominantly use similarity-based metrics which fall short in assessing the quality in open-ended scenarios. To this end, we introduce InterleavedBench, the first benchmark carefully curated for the evaluation of interleaved text-and-image generation. InterleavedBench features a rich array of tasks to cover diverse real-world use cases. In addition, we present InterleavedEval, a strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
mqliu/InterleavedBench
model· ♡ 5
♡ 5

Videos

Holistic Evaluation for Interleaved Text-and-Image Generation· underline

Taxonomy

TopicsVideo Analysis and Summarization