MMGenBench: Fully Automatically Evaluating LMMs from the Text-to-Image   Generation Perspective

Hailang Huang; Yong Wang; Zixuan Huang; Huaqiu Li; Tongwen Huang,; Xiangxiang Chu; Richong Zhang

arXiv:2411.14062·cs.CV·March 11, 2025

MMGenBench: Fully Automatically Evaluating LMMs from the Text-to-Image Generation Perspective

Hailang Huang, Yong Wang, Zixuan Huang, Huaqiu Li, Tongwen Huang,, Xiangxiang Chu, Richong Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces MMGenBench, an automated evaluation pipeline for large multimodal models that assesses their image understanding and generation capabilities through a novel text-to-image comparison approach.

Contribution

The authors develop MMGenBench-Pipeline and MMGenBench-Test, enabling fully automated, domain-diverse evaluation of LMMs' image understanding and generation performance.

Findings

01

Many top-performing LMMs in existing benchmarks underperform in image understanding tasks.

02

The pipeline effectively evaluates LMMs across 13 image patterns and multiple domains.

03

Results reveal significant room for improvement in current LMMs' image description abilities.

Abstract

Large Multimodal Models (LMMs) demonstrate impressive capabilities. However, current benchmarks predominantly focus on image comprehension in specific domains, and these benchmarks are labor-intensive to construct. Moreover, their answers tend to be brief, making it difficult to assess the ability of LMMs to generate detailed descriptions of images. To address these limitations, we propose the MMGenBench-Pipeline, a straightforward and fully automated evaluation pipeline. This involves generating textual descriptions from input images, using these descriptions to create auxiliary images via text-to-image generative models, and then comparing the original and generated images. Furthermore, to ensure the effectiveness of MMGenBench-Pipeline, we design MMGenBench-Test, evaluating LMMs across 13 distinct image patterns, and MMGenBench-Domain, focusing on generative image performance. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lerogo/mmgenbench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing

MethodsFocus