InfoDet: A Dataset for Infographic Element Detection

Jiangning Zhu; Yuxing Zhou; Zheng Wang; Juntao Yao; Yima Gu; Yuhui Yuan; Shixia Liu

arXiv:2505.17473·cs.CV·October 17, 2025

InfoDet: A Dataset for Infographic Element Detection

Jiangning Zhu, Yuxing Zhou, Zheng Wang, Juntao Yao, Yima Gu, Yuhui Yuan, Shixia Liu

PDF

3 Repos 1 Datasets 3 Reviews

TL;DR

This paper introduces InfoDet, a large dataset of infographics with extensive annotations, to improve visual grounding and object detection in charts and infographic elements for vision-language models.

Contribution

The creation of InfoDet, a comprehensive dataset with over 14 million annotations for infographic elements, supporting advancements in chart understanding and object detection.

Findings

01

InfoDet enhances chart understanding in vision-language models.

02

The dataset improves object detection accuracy for infographic elements.

03

Application of models to document layout and UI detection demonstrates versatility.

Abstract

Given the central role of charts in scientific, business, and communication contexts, enhancing the chart understanding capabilities of vision-language models (VLMs) has become increasingly critical. A key limitation of existing VLMs lies in their inaccurate visual grounding of infographic elements, including charts and human-recognizable objects (HROs) such as icons and images. However, chart understanding often requires identifying relevant elements and reasoning over them. To address this limitation, we introduce InfoDet, a dataset designed to support the development of accurate object detection models for charts and HROs in infographics. It contains 11,264 real and 90,000 synthetic infographics, with over 14 million bounding box annotations. These annotations are created by combining the model-in-the-loop and programmatic methods. We demonstrate the usefulness of InfoDet through…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The paper’s methodological rigor and practical significance are outstanding. The hybrid annotation strategy—combining synthetic and real data under a model-in-the-loop refinement—balances scalability and accuracy, achieving dataset quality comparable to COCO. The scale (over 100k infographics) and fine-grained annotations (charts, HROs, sub-elements) fill a clear research gap. The grounded CoT prompting demonstrates genuine insight into how structured visual grounding improves reasoning in moder

Weaknesses

First, the paper lacks quantitative validation for synthetic data fidelity and bias, which is a critical limitation given that nearly 90% of InfoDet consists of synthetic infographics. Without quantitative analysis or cross-domain alignment metrics—such as style distribution, semantic bias, or embedding similarity—the fidelity of synthetic data relative to real samples remains uncertain, casting doubt on downstream generalization and fairness. Second, there is insufficient analysis of annotation

Reviewer 02Rating 4Confidence 3

Strengths

1.The paper targets a real and under-served pain point in multimodal/chart understanding—current VLMs and chart/infographic QA systems often fail not because they cannot “reason,” but because they cannot reliably ground the relevant regions in cluttered infographic-style inputs. 2.The proposed InfoDet dataset is both large (≈101K images, ≈14M boxes) and unusually fine-grained, covering text, charts, and human-recognizable objects (HROs), as well as 26 chart-level marks and 75 chart types. This

Weaknesses

1.The core novelty lies in building a large, high-quality dataset and a reasonable model-in-the-loop pipeline, plus a demonstrative prompting scheme. Compared to typical ICLR work, the methodological/learning novelty is modest. 2. Given that the dataset provides structured and layered infographic elements, the paper could reasonably be expected to propose a model or training scheme that explicitly exploits this structure (for example, through element-level selection, layout-aware fusion, or hie

Reviewer 03Rating 6Confidence 4

Strengths

**1. Comprehensive and Well-Designed Dataset** - First large-scale infographic dataset (101,264 samples vs. prior Borkin et al. 393 samples) strategically combining real and synthetic data for authenticity and scalability - Efficient model-in-the-loop annotation achieving quality comparable to COCO (precision 93.9%, recall 96.7% vs. COCO's 71.9%/83.0%) - Multi-level annotations: element-level (charts, HROs) and mark-level (26 sub-element categories) providing fine-grained labels - Verified diver

Weaknesses

**1. Dataset Construction Issues: Representativeness and Transparency** - **Fine-grained annotation imbalance**: 75 chart types exist only for synthetic infographics. Authors mention GPT-4o achieved only 61.49% accuracy on real infographics but provide no alternative approach (human annotation? better models?), leaving dataset incomplete. - **Annotation process opaque**: No information on expert demographics (number? background: medical imaging experts? graphic designers? CV researchers?), train

Code & Models

Repositories

Datasets

OrionBench/OrionBench
dataset· 232 dl
232 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.