IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation

Yinghao Tang; Xueding Liu; Boyuan Zhang; Tingfeng Lan; Yupeng Xie; Jiale Lao; Yiyao Wang; Haoxuan Li; Tingting Gao; Bo Pan; Luoxuan Weng; Xiuqi Huang; Minfeng Zhu; Yingchaojie Feng; Yuyu Luo; and Wei Chen

arXiv:2601.04498·cs.LG·January 9, 2026

IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation

Yinghao Tang, Xueding Liu, Boyuan Zhang, Tingfeng Lan, Yupeng Xie, Jiale Lao, Yiyao Wang, Haoxuan Li, Tingting Gao, Bo Pan, Luoxuan Weng, Xiuqi Huang, Minfeng Zhu, Yingchaojie Feng, Yuyu Luo, and Wei Chen

PDF

Open Access 1 Datasets

TL;DR

This paper introduces IGENBENCH, a comprehensive benchmark for assessing the reliability of text-to-infographic generation models, revealing significant challenges and bottlenecks in current state-of-the-art systems.

Contribution

It presents the first systematic benchmark and evaluation framework for reliability in text-to-infographic generation, including an automated verification method using multimodal large language models.

Findings

01

Top model achieves 0.90 Q-ACC but only 0.49 I-ACC

02

Data completeness is a major bottleneck with 0.21 score

03

End-to-end correctness remains a significant challenge

Abstract

Infographics are composite visual artifacts that combine data visualizations with textual and illustrative elements to communicate information. While recent text-to-image (T2I) models can generate aesthetically appealing images, their reliability in generating infographics remains unclear. Generated infographics may appear correct at first glance but contain easily overlooked issues, such as distorted data encoding or incorrect textual content. We present IGENBENCH, the first benchmark for evaluating the reliability of text-to-infographic generation, comprising 600 curated test cases spanning 30 infographic types. We design an automated evaluation framework that decomposes reliability verification into atomic yes/no questions based on a taxonomy of 10 question types. We employ multimodal large language models (MLLMs) to verify each question, yielding question-level accuracy (Q-ACC) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Brookseeworld/IGenBench-Dataset
dataset· 642 dl
642 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Generative Adversarial Networks and Image Synthesis