OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved   Image-Text Generation

Pengfei Zhou; Xiaopeng Peng; Jiajun Song; Chuanhao Li; Zhaopan Xu; Yue; Yang; Ziyao Guo; Hao Zhang; Yuqi Lin; Yefei He; Lirui Zhao; Shuo Liu; Tianhua; Li; Yuxuan Xie; Xiaojun Chang; Yu Qiao; Wenqi Shao; Kaipeng Zhang

arXiv:2411.18499·cs.CV·April 1, 2025

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue, Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua, Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang

PDF

Open Access 1 Repo 1 Models

TL;DR

OpenING is a new comprehensive benchmark with 5,400 instances across 56 tasks designed to evaluate interleaved image-text generation, along with IntJudge, a model for assessing these outputs, revealing significant room for improvement in current methods.

Contribution

The paper introduces OpenING, a large-scale, diverse benchmark for interleaved image-text generation, and presents IntJudge, an effective evaluation model surpassing GPT-based evaluators.

Findings

01

Current methods show substantial room for improvement.

02

IntJudge achieves 82.42% agreement with human judgments.

03

OpenING covers diverse real-world scenarios for robust evaluation.

Abstract

Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding and generation tasks. However, generating interleaved image-text content remains a challenge, which requires integrated multimodal understanding and generation abilities. While the progress in unified models offers new solutions, existing benchmarks are insufficient for evaluating these methods due to limitations in data size and diversity. To bridge this gap, we introduce OpenING, a comprehensive benchmark comprising 5,400 high-quality human-annotated instances across 56 real-world tasks. OpenING covers diverse daily scenarios such as travel guide, design, and brainstorming, offering a robust platform for challenging interleaved generation methods. In addition, we present IntJudge, a judge model for evaluating open-ended multimodal generation methods. Trained with a novel data pipeline, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LanceZPF/OpenING
pytorch

Models

🤗
IntJudge/IntJudge
model· 6 dl· ♡ 1
6 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Digital Storytelling and Education

MethodsEmirates Airlines Office in Dubai