MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning
Yuxuan Luo, Yuhui Yuan, Junwen Chen, Haonan Cai, Ziyi Yue, Yuwei Yang, Fatima Zohra Daha, Ji Li, Zhouhui Lian

TL;DR
This paper introduces MMMG, a comprehensive benchmark for evaluating the reasoning capabilities of text-to-image models in generating knowledge images across multiple disciplines and educational levels, highlighting current model limitations.
Contribution
The paper presents a new task of knowledge image generation, a large multi-disciplinary benchmark (MMMG), and a novel evaluation metric (MMMG-Score) for assessing reasoning in image generation models.
Findings
State-of-the-art models show significant reasoning deficits.
GPT-4o achieves an MMMG-Score of only 50.20.
Baseline FLUX-Reason scores 34.45, indicating room for improvement.
Abstract
In this paper, we introduce knowledge image generation as a new task, alongside the Massive Multi-Discipline Multi-Tier Knowledge-Image Generation Benchmark (MMMG) to probe the reasoning capability of image generation models. Knowledge images have been central to human civilization and to the mechanisms of human learning -- a fact underscored by dual-coding theory and the picture-superiority effect. Generating such images is challenging, demanding multimodal reasoning that fuses world knowledge with pixel-level grounding into clear explanatory visuals. To enable comprehensive evaluation, MMMG offers 4,456 expert-validated (knowledge) image-prompt pairs spanning 10 disciplines, 6 educational levels, and diverse knowledge formats such as charts, diagrams, and mind maps. To eliminate confounding complexity during evaluation, we adopt a unified Knowledge Graph (KG) representation. Each KG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks
MethodsADaptive gradient method with the OPTimal convergence rate · Diffusion
